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PREFACE 


ibis book deals with certain prob- 
lems met in using the methods of science to study behavior. It has 
only one purpose, namely, to aid in training better scientists among 
those who make psychology their subject matter. I think there is 
evidence that we need to train better scientists in psychology; that 
this book will aid in attaining this objective rests only on a faith since 
I have no evidence on the matter. A strong need to train more com- 
petent scientists was felt by the staff at Northwestern University a 
number of years ago and among the steps taken to improve our 
program was the introduction of a course which dealt exclusively 
with research problems. This may or may not have been a wise step 
(btroduemg new courses cannot be the cure of all our deficiencies) 
and I mention it only to give a specific origin for the book. I was 
given primary responsibility for this course (listed in the catalogue 
as Scientific Method in Psychology) which is required of all graduate 
students in their first year. The present book consists of my lectures 
as they exist at the end of five years during which the course has 
been offered. 

In writing the book (through teaching the course) I felt an obliga- 
tion to reflect current research practices as I saw them, with emphasis 
largely on experimental research. As will be noted, I include under 
research practices far more than the design of an experiment and 
collection of data. Indeed, I have bcluded topics which are distinctly 
controversial, and I have btroduced issues which I think have been 
given far less attention than they deserve. The result is that in a cer- 
tab sense the book becomes a philosophy of science. My philosophy 
of science, being as any philosophy is, a personalized affair, may not 
have allowed me to sec down a true picture of the contemporary 
research scene. But, even if I were so unbiased that I could accurately 
reflect this scene, there are so many matters which arc controversial 
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and on which 1 found it necessary to take a stand, that I fully expect 
to be disagreed with at several points. If I did not believe that the 
training of research workers would benefit from further discussion 
of these controversial matters, I would never have submitted these 
materials to the inspection of others. 

To say that this work is a philosophy of science can only be said 
with an apology to those who are by profession philosophers of 
science, for this book is at best a pragmatic philosophy and may seem 
naive to them. It is a pragmatic philosophy because it is concerned 
with issues and problems more closely related to the actual doing of 
research than are the issues and problems handled by the philosophers 
of science. Only at a few points do the problems clearly overlap those 
commonly found in the writings of the philosophers of science. 

My debts are many. For over lo yeare I have been privileged to be 
at a university which encourages and facilitates teaching and re- 
search in line with the finest traditions of our great educational in- 
stitutions. I have also had the good fortune to be a member of a small 
department dedicated vigorously to research and teaching. Only a 
few of the present chapters have been read critically by my col- 
leagues but I suspect every topic in the book has been discussed with 
me at one time or another by at least one associate. I mention this 
because while I recognize a real debt to my colleagues as a result of 
these discussions, it may be greater than I realize. The remarks which 
I boldly set forth as my own may have actually been germinated by 
one of my colleagues but the passage of time has obscured the source. 
Yet, it may be a blessing, for I know that my position on some matters 
is not popular and to attempt to give credit where the source is 
questionable might result in injustice. 

Professors R. M. Elliott and Kenneth MacCorquodale have criti- 
cally read the entire manuscript. They, too, have disagreed with my 
position on some issues but have left the final decisions to me. I owe 
both much for smoothing and tempering my prose. 

Students who have listened to my lectures or read some of the 
materials have pointed out ambiguities and inconsistencies which I 
have tried to correct. Many of the illustrations in Chapters 3, 4, and 
5 are taken from student reports. Mrs. Irene Nolte has typed the 
manuscript and has eliminated inconsistencies in the format. 

Finally, I wish to thank the following publishing firms for allow- 
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ing me to quote material: University of Chicago Press; The Dryden 
Press; Appleton-Century-Crofts, Inc.; The Journal Press; Cambridge 
University Press; John Wiley & Sons; American Psychological As- 
sociation; American Journal of Psychology; American Journal of 
Physics; American Scientbt; Science. 
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Introduction 


SOME GENERAL COMMENTS ON SCIENCE AND 
PSYCHOLOGICAL RF.SEARCH 


Xhe purpose of the methods of 
science is to achieve a description and understanding of nature (the 
universe). By description I mean the definition, cataloguing, or 
classification of events, objects, and phenomena which define nature, 
and the statement of empirical relationships associated with these 
events, objects, and phenomena. By understanding I mean the re- 
duction to the smallest possible number of genera! laws which would 
account for the various specific facts. The descriptive part of science 
is concerned with research per sc; what I have called understanding 
is usually achieved through theory. 

This particular book is concerned with the scientific method as a 
means of studying behavior, particularly by those who call them- 
selves psychologists. I will try to reflect faithfully various research 
practices in psychology; but, in spite of the manifest enthusiasm 
which I have for my profession, I find a great deal to criticize in 
these research practices. When I am critical it is in the interests of 
betterment of psychological research, not because of any over- 
powering urge to censure. For certainly, it seems to me, we need to 
maintain a continuous review or inspection of the attempts to apply 
scientific method to the study of behavior. Some of these attempts 
make science look ludicrous and they must be evaluated for what 
they are. Probably there is no other area of human endeavor which 
so badly needs a thoroughgoing appHcadon of the scientific method 
as does psychology, for probably in no other area are there so many 
misconceptions, so many half-truths, and so many abortive attempts 
to understand behavior. 
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The social forces following World War II markedly increased the 
number of psychologists in positions where (among other things) 
they are attempting to minister to the mental ills of mankind. Such 
ministrations arc severely and perhaps critically hampered by an 
almost complete lack of relevant behavioral principles eventuating 
from research. To some this is a frightening situation. Whether the 
society of psychologists should have allowed thenasclves to be drawn 
into this situation is a complex and controversial matter; it is not 
my intent to debate the issues. But, the appalling schism between 
the facts of psychology and what many practitioners are trying 
to do is an issue, for the breach can only be reduced by more and 
better research. If hundreds of psychologists choose to work in 
industry, clinics, guidance centers, and so on, and if the profession 
is to survive as a respected one, there seems to be no answer except 
sound research, whether this research is done in the applied setting 
or in the universities. 

Wc at the uiuvcrsiries where graduate work is done are almost 
entirely responsible for the training of research people, We are 
largely responsible not only for the quality of research work but 
also for whether or not it is done. We cannot escape the responsi- 
bility we have of inculcating the highest standards of. research in 
our students as well as training more students for research cheers. 
Wc must not only institute these standards but we must also con- 
tinuously police ourselves against any lowering of them. Standards 
of research are not static; the highest type of research standards at 
any given time almost inevitably leads to a subsequent raising of the 
standards; good research breeds better research. Sodety must not 
only be protected against the practitioner who operates without the 
leavening influence of principles derived from research but likewise 
must be protected against the shoddy effects of ill-conceived and 
grossly misinterpreted research. 

It is apparent that universities vary considerably in standards of 
research for their students, and the diverse standards are disseminated 
in turn by these students to their students. I do not think there is a 
nuddlc ground on these matters; psychological research is an honor- 
able profession and training for such a profession must be at the 
highest level we know. An examination of current research reported 
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in our journals shows that the essential aspects of the scientific 
method are still largely foreign to some psychologists. The critical 
aspects of this book, therefore, stem from a faith that there must be 
ways by which general standards of research practices can be ele- 
vated, at least to some degree. 

Now, you may ask: “Why this fixation or fetish on the applica- 
tion of scientific method to psychological problems?” The answer 
is that no one has conceived of a better way for demonstrating and 
understanding the lawfulness of nature. I, therefore, believe that we 
should promote with all our vigor the appropriate use of these 
powerful tools of understanding. I would not, of course, deny the 
right of novelists, poets, artists, or indeed, metaphysicians, to record 
their interpretation of human behavior. On the contrary, I would 
defend such a right so long as it is accurately described for what 
it is and the interpretation clearly distinguished from those based on 
scientific method. At least at the present time, the word “science” 
seems to have a certain prestige value, and we find the most curious 
activities masquerading as scientific endeavors. The record should 
be kept clear. 

Some other preliminary remarks need to be made to set the tone of 
this argument. 1 will not engage in quarrels about the philosophical 
bases of science, about its social implications, nor its evils and virtues. 
In some instances I will make some assertions about these matters 
if I think they clarify subsequent material. I shall make no attempt 
to defend science and scientists against certain criticisms; if this 
need be done, it has been done (e.g., 2, y, 7). My basic premise is 
that scientific research in psychology (as well as other disciplines) 
is a vital part of man’s ever-extending endeavor to comprehend the 
universe. I wish merely to discuss critically some of the problems 
of research m psychology as f sec them. 


THE ASSUMPTIONS OF SCIENCE 

Probably not many scientists arc able to formulate adequately the 
assumptions which logically undcrUe their labors. Moreover, it is 
likely that the average scientist has not done much thinking about 
these assumptions, for he can do perfectly good work without it. 
However, since these issues sometimes plague the curious student of 
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science, I want to discuss what I consider the two basic assumptions 
of science. 

DeterMinisvJ. One of the assumptions of a scientist is that there is 
lawfulness in the events of nature as opposed to capricious, chaotic, 
or spontaneous occurrences. Every natural event (phenomenon) is 
assumed to have a cause, and if that causal situation could be exactly 
reinstituted, the event would be duplicated. In the strictly physical 
world, determinism would probably be accepted by all, scientist 
and layman alike. Apples do not drop up one day and down the 
nextj gasoline engines do not run without fuel; rain does not fall 
unless there are clouds in the vicinity. In short, there is a predict- 
ability (reliability) of the events in the physical world, and few 
would disagree that appropriate search would be likely to find the 
particular conditions under which the events occur. 

Now certainly the awareness of orderliness and lawfulness in 
nature is not a product of modem sdience. Early man noted regu- 
larities such as the changes of the seasons and the growth of animals 
and plants under certain conditions and not under others. Science 
has merely allowed us to pyramid these regularities systematically 
and go back of apparent causes to more basic causes or correlates 
(antecedent events or causes) ; in so doing it has allowed us to bring 
many phenomena under a single causal principle. It has, in effect, 
ordered the orderliness of nature as. distinct from casual observations 
and the unrelated interpretations of common sense. Furthermore, 
science discovers orderliness about phenomena which are not readily 
apparent to the human senses. 

So, in general, the principle of determinism is an underlying 
assumption of the scientist, and for the apparent physical world is 
widely accepted. Even here, however, the acceptance is not uni- 
versal, especially where topics such as the origin of life, or the doc- 
trine of evolution are concerned. That the scientist does not always 
realize his acceptance of the principle of determinism probably stems 
from the obviousness of it (to him). It is taken so thoroughly for 
granted that verbalization of it would appear redundant. He applies 
his methods of science and time after time finds the orderliness of 
nature of which we have spoken. Indeed, even if he found chaos in 
a given area of nature, it is quite likely that he would not imme- 
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diately, or indeed soon, question the assumption of determinism. 
Rather, he would look to his investigative procedures for the reasons 
behind the discovery of chaos instead of orderliness. Since science 
has found orderliness decade after decade and in subject matter after 
subject matter, the scientist is prone to believe that all events of 
nature, be they events characteristic of stones, oceans, angleworms, 
ministers, corpuscles, or nerve tracK, have discoverable correlates. 
But, whether or not the scientist has thought about or verbalized 
this assumption is basically irrelevant to his work as a scientist; it is 
quite possible for him to carry on excellent scientific work without 
ever having heard about determinism. (See Benjamin, /, for a more 
complete discussion of why the scientist may not pay much atten- 
tion to this assumption.) 

When we turn to the problem of determinism in human behavior, 
the principle is not so easily accepted by all. There are people, edu- 
cated and uneducated, prominent and obscure, who do not hold to 
the doctrine of determinism in human behavior. Certain religions 
can accept it up to a certain point only to abandon it beyond that 
point for other explanatory principles. I will not argue these matters; 
the interested reader is referred to a paper by Griinbaum (4). It is 
sufficient to say that to reject determinism for a part or all of human 
behavior is in a sense to reject the application of scientific methods 
to the study of human behavior. Rejection of such a fundamental 
premise at this stage of development of psychology is decidedly pre- 
mature, for application of scientific methods by psychologists has 
already revealed a pattern of orderliness in behavior and many cause- 
effect relationships commensurate with the age of psychology as a 
science. There is plenty of room for pessunism about how rapidly the 
application of scientific method to the study of behavior will reveal 
all cause-effect relationships which are necessary for a fairly complete 
understanding of human behavior, i.e., to reach a stage that is 
roughly equivalent to the knowledge achieved by physicists. Even 
those of us who are the most ardent advocates of the use of the 
scientific method may have rare moments of despair when we 
realize how little progress will probably be made in our own life- 
time toward a thorough understanding of the behavior of the human 
child or adult. But the motivation for discovery, whatever its 
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source, is remarkably resistant to extinction. Basically the research 
psychologist knows, however personally important his research may 
be to him, that his lifetime contribution to understanding behavior 
will be small. Yet in spite of this and in spite of the small extrinsic 
rewards likely to accrue from his efforts, he retains an unshakcable 
belief in the doctrine that all behavior, simple or complex, is deter- 
mined by discoverable causes and will eventually yield to the 
methods of science. Determinism is a necessary assumption for the 
scientific enterprise. 

Finite causation, A. second general assumption made by the scien- 
tist is that every natural event or phenomenon has a discoverable 
and limited number of conditions or factors which are responsible 
for it. For, as Pap (/o) indicates, sdence would be almost a hopeless 
undertaking if nature were so constituted that everything in it 
influenced everything else. 

This assumption need not be dwelled on. The length of an 
astronomer’s toenails doesn’t influence the phases of the moon; the 
color of the secretary’s hdr doesn’t affect the height to which the 
com grows in an Iowa field, and a Pygmy tribe in New Guinea has 
little influence on the alcoholic consumption of a truck driver in 
Brooklyn. 

Specific assumptions. There arc many less general assumptions 
with which the scientist must deal in his day-to-day work that are 
probably more important from a pragmatic point of view than are 
the more general assumptions. Speaking now only of psychological 
research, there are many Afferent methods by which a given prob- 
lem may be attacked. Each of these methods involves certain assump- 
tions, some common to all the methods, some unique. There are in 
addition the omnipresent assumptions dealing with sampling and 
statistical analyses. The research psychologist must weigh the 
seriousness of failing to meet certain assumptions; he must evaluate 
which method violates asumptions least, if at all; he must, in short, 
choose the method which has the greatest probability of meeting the 
assumptions of acceptable research procedure. I merely mention 
these matters at this point, and will deal with them no further here, 
for these are major problems which are reserved for later discussion 
in another section of tiic book. 
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THE HISTORY OF SCIENCE AND THE HISTORY 
OF THE SCIENTIST 

Scientilic work is an unending series of analytical steps. If we ask 
about our understanding of the complete order of the universe, the 
series of analytical steps extends beyond our vision. Such a statement 
is real in the sense that it reflects the history of science, and, using 
this history as a base for projection into the future, we see no other 
picture except this series of analytical steps. But such a statement 
gives not the slightest hint of science as it appears in concrete form 
to the working scientist. There is something very cold, forbidding, 
and uninspiring about a description of scientific work as an infinite 
series of analytical steps. But, to the working scientist, to the man 
who takes his research seriously, it becomes at once the most stimu- 
lating, frustrating, exciting, discouraging occupation imaginable. 
The few analytical steps successfully taken by a scientist in his 
lifetime are interlarded with defeaK, misconceptions, and bumbles. 
The analytical steps are the fruits which appear in the history of 
science. Only a detached historian, looking at science and not at 
the scientist, can describe science as calculated and cold. There is 
nothing chilling, ruthless, nor inexorable about the march of science 
to the scientist as he works. Science in practice is full of dead ends; 
plenty of its great discoveries occurred as if by accident; it has its 
share of blunderers as well as men of brilliance. Put the positive 
products of these men's endeavors together in the history of science 
and we have the long series of what might appear at times to be 
capriciously generated analytical steps. 

The process of discovery has been as varied as the temperament of 
the saentists. Of course, individual research projects are as a rule 
unspectacular, within their small scope fairly routine and logically 
consistent; but precisely some of the most important contributions 
have initially depended on wrong conclusions drawn from erroneous 
hypotheses, misinterpretadons of bad experiments, or chance dis- 
coveries. Sometimes a simple experiment yielded unexpected riches, 
whereas some most elaborately planned assaults missed the essential 
effect by a small margin. Great men at times had all the “significant 
facts” in their hands for an important finding, and yet drew trivial or 
wrong conclusions; others established correct schemes in the face of 
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apparently contradictory evidence. Even the work of the great heroes, 
viewed in retrospect, sometimes seems to jump from error to error 
until the right answer is reached as if with the instinctive certainty of 
a somnambulist; indeed, this gift must be one of the deepest sources 
of greatness (^j p. 90). 

The student who decides to be a research scientist and who starts 
his work in that direction may have a misconception of the rewards 
of his occupation. Many of us thought as we first started doing re- 
search that next Monday, or next semester, or surely next year we 
would make revolutionary discoveries or have revolutionary insights. 
It seldom happens. We must be satisfied, indeed, be proud of little 
insights and little discoveries, for these are the lifeblood of a science. 

The growth of science . . . does not depend only on a few great dis- 
coveries. It depends equally on that slow accretion of multitudinous 
small steps that furnish the bases for, and the necessary extensions of, 
those discoveries, and also on the correction of the even more numer- 
ous missteps continually being made (S, p. ay). 

To have an original thought, to perceive a new relationship, to com- 
prehend a complex relationship, or, to plot out a well-controlled ex- 
perimental design— to do these things and to receive personal satis- 
faction for doing them make the life of a scientist. If a person cannot 
work long hours at research without feelings of martyrdom, science 
is not his occupation. 

PURE AND APPLIED RESEARCH 

In this section I wish further to set the temper of the chapters to 
follow by making certain assertions about problems which con- 
tinually arise around the dichotomy of pure versus applied research. 
These problems seem always to have been present in varying degrees 
even among scientists living in a strictly academic atmosphere. But, 
with increasing sponsorship of research by government agencies, the 
problems have extended into the governmental administrative, hence 
political, domain. And of course, the issues have ever been present as 
a consequence of the hiatus in the understanding of the layman of 
the scientist’s motives. In discussing this issue I shall first try to make 
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dear what I mean by the essential terms I -vviU employ in this dis- 
cussion. 

I use the terms “pure” and “applied” merely to identify the ends 
of a crude continuum. This condnuum is defined by the atdtude 
of the research worker. At the applied end of the continuum, we 
have the research worker who asks himself questions about the man- 
ner in which the world (nature or social order) is functioning and 
does research concerned with these questions only if it appears that 
the product of his research will dearly and immediately modify the 
way in which the world is functioning. At the other extreme is the 
investigator who asks himself questions about the %vorld, questions 
about why nature behaves as it does, and sets about to get the an- 
swers without any concern that they may be used to change the 
world. All this pure research worker wants to do is understand the 
world. In between these extremes, of course, are gradations. With- 
out doubt there are many research workers who ask themselves 
research questions as a result of a basic curiosity about nature and 
then further ask what relevance the ans^ve^s to such questions might 
have in changing the world. Whether they proceed with the re- 
search or not depends on the values they place on the two aspects 
of the problem. And of course, a man need not occupy a static 
position on the continuum; he may range as his interests and values 
change or, as during a war, when emergencies demand it. 

Another facet of the pure-applied problem is that presented by 
the technologist. The technologist is one who applies the results of 
research; he uses the knowledge gained by research to change the 
world. Tbe technologist may be a different person from the research 
worker, but it is also quite obvious that he may himself be engaged 
in research. That is, we may well have a research worker with a 
strong technological bias; he docs the research and also applies the 
knowledge to effect some change in the world. 

Now it should be clear that the intent of this book is to discuss 
research methods and procedures per se, whether this research is 
applied or pure. But, I find it necessary so frequently to defend 
freedom of inquiry in general, that I must make a number of other 
comments about this subject, since it bears directly on the applied- 
pure problem. 

Freedom of inquiry is a reflected but integral part of our Consti- 
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tutional liberty in the social order; it has been singled out, fostered, 
and protected largely by the great universities of our country. I 
suppose that it may be difficult for the average layman to compre- 
hend just how much freedom of inquiry means to a scientist. And I 
think also it is hard for the scientist to express to the layman why it 
vs such an essential component of the research atmosphere. When 
the scientist may pursue his work, wherever it may lead him (pro- 
viding no harm befalls others during the pursuit), without having 
to answer the question, “what good is this?”, that is what I mean 
by freedom of inquiry. Such a situation is still maintained at most 
of our universities. I have been at Northwestern University for lo 
years; never once have I heard of research being questioned by a 
dean, or other administrative officer, or a colleague, because the 
research worker had no answer to the question “what good is this?”; 
indeed, the question is never asked. The research might be ques- 
tioned on a number of grounds, such as methodological adequacy; 
but never does the man have to defend his work against the charge 
that it has no immediately foreseeable application. It is this guardian- 
ship of freedom of inquiry which to many is the most magnificent 
tradition of our universities. 

Seldom are direct, frontal attacks made on this freedom. The 
attacks, when they occur, are neither calculated nor obvious but 
nonetheless are to be reckoned with. Recently, for example, a book 
appeared dealing with methods of research in education, psychology, 
and sociology (5). Let me give some quotes from this book, which, 
in many respects has my admiration but which on this matter 
frightens me: 

This criterion of importance, in choice of a problem, involves such 
matters as significance for the field involved, timeliness, and practical 
value in terms of application and implementation of results (p. 54). 

Scientific work in education, psychology, and the social sciences in 
general has an especially urgent obligation to play a social role in 
rendering service to society and humanity (p. 54). 

It is high time that the social responsibilities of scientists and of 
research workers be recognized and accepted (p. 54). 

The research worker is not expected, as a general rule, to implement 
the results of his studies, however desirable this consummation may be. 
He is not even compelled to point out the practical application of his 
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findings, although this step seems essential, especially in the social 

sciences (p. 55). 

It is apparent, in my opinion, that such teachings can have a con- 
stricting effect on the very freedom that allows them to be pub- 
lished. But, let us set the record clear. Our society has a perfect right 
to expect scientists to be good cidzens, to be loyal to the institu- 
tions under whose protection they live, and, in general, to be like 
any other citizen on these matters. But this is quite a different issue 
from saying that social scientists should be spending their time in 
research which will solve the social and political problems of the 
world. Society has always had problems crying for solution; science 
has helped solve many such problems. But, it would be a curious 
contradiction and a patently dangerous situation if science must 
heed every demand made upon it. Not only would such 3 turn of 
events be contrary to freedom of inquiry but it might be extremely 
shortsighted. The urge to succor the momentary ills of society 
springs from a noble motive, but that it is the best means of eliminat- 
ing the ills of generations yet unborn may be doubted. The scien- 
tist’s fundamental responsibility to society is to utilize his freedom 
of inquiry to the utmost, pursuing his researches wherever they may 
lead him in his field of competence. 

It seems to me, therefore, that the evaluation of the importance 
of a piece of research will depend on the philosophic^ and on-the- 
job contexts which prevail. Private industry might evaluate it in 
terms of a step up in production; a defense department admini- 
strator on the basis of whether it would be of value in the training 
of new recruits; a university professor in relation to the soundness 
of its approach and how much it advanced our understanding of 
nature. There is no universal answer to the question of whether or 
not a piece of research is important. There are purely administrative 
decisions concerning it, and these will differ depending upon the 
philosophical convictions and values of the person making the 
decisions. 

Without doubt it is the extreme purist in research that laymen 
find most difficult to understand. He dearly is a fellow who is 
interested in knowledge for knowledge’s sake. If someone wants to 
make something practical out of his work, i.e., if a technologist uses 
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his results, he has no objection but he isn’t interested in doing it 
himself. The history of science shows that these purists have, un- 
wittingly, made many fundamental contributions to our social order. 
One could list many names, such as Mendel and Faraday, who 
worked for the sheer pursuit of knowledge, or understanding of 
nature but whose discoveries were applied later by others in a 
practical manner. One can easily imagine situations in which pure 
research could make dramatic contributions to the welfare of the 
world. Suppose, for example, that in the biology department there 
is a fine old professor who has spent his entire life studying butter- 
flies. In most parts of the world butterflies have little impact on the 
order of nature; they don’t harass the farmer’s crops; they don’t 
seem to be needed to maintain balance in the insect world; at very 
best their contribution is an aesthetic one. But, supposing someone 
should discover that the polio virus is transmitted by butterflies. 
At this point the exact and detailed knowledge— the pure knowledge 
for its own sake— would become tremendously important socially, 
its application of the highest importance. Butterflies could be 
brought under control almost immediately because complete data 
were available on their reproductive habits, life history cycles, and 
so on. 

Let us not, therefore, be hasty in evaluating the worth of any 
research; what may seem to be pure and socially worthless today 
may become highly significant tomorrow, next year, a hundred years 
from now, or perhaps never. But because the pure research worker 
may, by his experiments, discover fundamental facts of nature, we 
must maintain the institutions in our society which will encourage 
such research. It is the universities, given support for pure research 
by philanthropic foundations and certain agencies of the govern- 
ment, wLkh wUl centinne ta k/t \Vve tVnei ptr^eciors of this unre- 
stricted form of inquiry. 

I am sure there are those who cxtoll the virtues of pure research 
and who loudly proclaim their right to do it, not from the basic 
desire to seek knowledge per re, but as a socially acceptable cloak 
behind which to retreat from reality. But, even so, while we might 
condemn such intellectual snobbery and detachment, highly signifi- 
cant research might actually be accomplished by such a person. 

I do not think we can deny the contributions made to our 
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social order by pure research. Nevertheless, we could certainly ques- 
tion whether or not this insistence on the right to do pure research 
is an efiicient way for science to proceed. We don’t know, for ex- 
ample, how many pure researches have been done which will never 
have any practical value and which are not particularly important 
for understanding of nature; undoubtedly there are thousands. It is 
quite possible that the application of scientific findings to the solu- 
tion of world problems, whatever they are, might be much further 
developed if all scientists had a strong technological streak in them. 
I do not know how such questions can be answered with assurance. 
Many respected scientists (e.g., Oppenheimer, y) firmly believe 
that we need both the extreme purist and the extreme technologist 
for most rapid progress; the two complement each other. Of one 
matter we may be sure— as long as the atmosphere of our society 
remains free as it is today and as long as bigoted men never find 
their way into offices of power over research, we will continue to 
have research workers at all points on the continuum I have de- 
scribed. The degree to which the research done by a man is pure 
or applied depends upon his acritude, and this attitude is a product of 
our culture. As long as our culture tolerates, no, not tolerates, but 
fosters this diversity of attitude we will continue to have the great 
range in the nature of research. I think it should be that way, not 
because I necessarily believe that it speeds up acquisition of scien- 
tific knowledge or that it may lead more rapidly to social progress. 
These questions have no answer for me. But, I would want this to 
continue because I think freedom is a fundamental premise of 
science as well as govemment- 


PREV3EW 

In completing this introdurtory chapter I will give a fairly ex- 
tended preview of the topics to be covered in the subsequent 
chapters. 

I shall talk about phenomena and laws about- those phenomena 
as being the basic data with which psychologists work. Alany of 
these phenomena are given specific names, such as color shock, 
intelligence, extinction, brightness contrast, pitch, stereotypes, and 
so on. But whether named or not, I shall simply call all of them 
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phenomena. The basic purpose of research in psychology is to dis- 
cover phenomena, variables which affect them, and the lawfulness 
of the effects. In the next chapter, therefore, we shall set up a simple 
conceptual system around the research situation so that various 
aspects of the situation can be discussed. We shall see that various 
components of the situation may in themselves provide special prob- 
lems of research. 

In our initial research attempts in an area, one of the first stages 
is to demonstrate a reliable phenomenon (or phenomena) and give 
it an operational definition. Since the operational definition forms 
the base of any scientific inquiry, we will need to spend consider- 
able time on the matter of constructing or formulating these defini- 
tions. We shall further discuss their limitations and implications. 
My particular way of viewing an operational definition is that in 
simple form it becomes an experimental design. With this as back- 
ground we will then move wholly into the area of research design. 

The material to be presented on research designs will be both 
expository and critical. The procedure will not be that of present- 
ing in detailed form the acceptable research designs for the many 
types of problems on which psychologists work. These are avail- 
able in a number of sources. Rather, 1 will look at general types of 
research designs which are used, with but little attention to specific 
variations needed for particular research problems. I will not be 
concerned to any extent with the statistical problems of research 
design. The major effort will be a cataloguing of major research 
errors which are being made today, all of which will be extensively 
illustrated. It is apparent, then, that I will attempt to teach thinking 
about research design by first pointing out errors that are frequently 
found in the literature and then showing how they can be avoided. 
When this material is accompanied by a search by the student for 
such errors in published literature, I have found it to be lughly 
instructive providing one does not allow the negative aspects to over- 
shadow the basic fact that good sound research can be and often 
is done. 

The material covered thus far in this preview of chapters to come 
is largely concerned with the descriptive aspects of our science, that 
is, with the problems associated with the discovery of phenomena 
and the working out of variables related to them. The critical 
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material on research designs is concerned with the establishment of 
reliable phenomena (and laws about them) and avoiding pseudo- 
phenomena. Now, as indicated earlier, the second general province 
of science is that of explanation, theory, or understanding in a more 
comprehensive way than that given by the immediate data. While 
there is much disagreement among philosophers of science, and 
scientists too, as to just what theory is, there is fairly general agree- 
ment on the objective of explaining by means of theory. This is to 
say that explanation is the reduction of all laws or relationships to 
as few as possible independent basic concepts and assumptions. It is, 
in effect, showing chat specific or detailed empirical findings are 
special cases of (can be deduced from) more general Jaws. But, the 
methods and terminology used in carrying this out in psychology 
have produced a host of problems. Theoretical or explanatory 
attempts are extremely unstructured; there is an appalling lack of 
agreement on terminology; there are unresolved problems on when 
an empirical concept becomes a theoretical concept and vice-versa. 
There is the further problem of viewing the theorist as a human 
being, with scores of orienting biases. Nobody is as invulnerable as 
a theorist who does no research, and nothing is as impregnable as a 
theory which suggests no research. At the same time, nothing is so 
fatal to a theory as a well-ordered set of empirical relationships. 

I must say quickly that I make no pretense of bringing order out 
of the chaos. My only hope is that we can develop a set of standards 
or a point of view by which we can approach this chaos with some- 
what greater understanding. And of course, my concern will not be 
with assessing any particular theory and its relationship or adequacy 
to a particular subject matter. Rather, I shall use illustrative theo- 
retical formulations to demonstrate the diversity of approach which 
is occurring in the attempts at explanation. 

Finally, I have set aside a concluding chapter for presenting a 
number of ideas on research which have not been covered in other 
sections. Perhaps the word ideas is inappropriate; perhaps the term 
biases is more accurate. But, since the editors of this book have not 
seen fit to strike them our, they may at least serve the purpose of 
generating fruitful arguments. 
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Analysis of the Research Situation 


.Among the products of the scien- 
tist’s work are certain conceptual structures around which he orders 
his thinking. They seem to help him keep his ideas and facts in a tidy 
state of affairs. One such simple conceptual structure which has had 
widespread repetition in psychology is the S~0— R conception. 
These letters may be thought of as standing for the gross components 
of the research situation in psychology, namely, stimuli, organisms, 
and responses. TTie stimuli may be distal or proximal} the responses 
immediate or developmental in nature. This gross analysis need not 
necessarily imply any strong theoretical or orienting bias; but, psy- 
chologists of different interests place varying emphases on the three 
components. Some psychologists look for functional relationships 
between stimuli and responses with comparative disregard for the 
relatively permanent capacities of the organism. Personality theorists, 
on the other hand, are largely concerned with identifying and char- 
acterizing these organismic capacities and traits. The physiological 
psychologist often takes the stimulus-response relationship as a start- 
ing point as he searches for the physiological mechanisms mediating 
the relationship. Regardless of our particular research interests or of 
our particular penchants for types of research tools, all three com- 
ponents of the research situation are important and will continue to 
be important in our quest for comprehensive laws of behavior. For 
the presentation here, where the emphasis is on experimental method- 
ology, all three components mast be discussed in detail. We will start 
with the response. 

RESPONSES 

Human activities, behavior, or more simply, responses, constitute 
the universe of phenomena which psychologists describe and attempt 
to understand. I do not wish to put any restrictions on the magnitude 

*7 
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of the response with which we will deal; it may be thought of as 
being as “small” as an eyeblink or as “large” as social interaction. It 
has been common to speak of responses as being glandular and mus- 
cular activity. But, we do not often measure responses at this level; 
rather, we measure products of glandular and muscular activity. We 
note a verbal response; we count pencil marks; we measure latencies. 
The facts of the case are that the great bulk of the data of current 
psychological research reflects a gross but strict behavioral analysis, 
i.e., what the individual or group accomplishes or does, not what 
muscles are used nor what the chemical action of the cells is. This 
is not universal; Guthrie (/;) has made strong pleas for actually de- 
scribing muscular movements with comparative disregard for what 
these movements may accomplish. Many physiological psychologists 
are concerned with responses at the strict physiological level, such as 
nerve discharges or thyroid activity. Nothing is sacred about any 
level of description. Nevertheless, certain levels of response analysis 
seem to be more fruitful or useful at a given stage in the development 
of a science than do others. Or, it may not be that they are more 
useful but only that they are more in vogue, or perhaps, easier to 
accomplish. It should be clear that for me to say that the great bulk 
of psychological research is conducted at the gross behavioral level 
does not make it “right.” 1 do not know how to judge the "rightness” 
or “wrongness” of such an issue; I am stating merely what seems to 
me to be a fair appraisal of how most research psychologists are 
behaving today. 

SCALES OF MEASUREMENT 

It would be presumptuous (and pointless) to attempt to catalogue 
the myriads of responses which form the raw data of the psychol- 
ogist and define behavior as he studies it. One needs only leaf 
through a few representative journals to realize the ingenuity shown 
by psychologists in selecting behavior segments for study. I will not 
at this point, nor at any later point, engage in disputes concerning 
the significance of the multifarious responses studied by psychol- 
ogists. That is, a criticism which has been levied at psychologists 
(sometimes by psychologists, c.g., 2 ) is that the responses they study 
are not really important; they are not the responses which represent 
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the behavior of an army general weighing the prospects before send- 
ing armies into battle; they are not the responses involved in a lynch- 
ing or a revival meeting; they arc not reflective of the behavior of a 
Senate floor leader trying to get a bill through the chamber. How- 
ever, I 'Would defend the proposition that research in psychology 
necessarily involves measurement, and that the rapidity with which 
research will embrace these so-called significant behaviors depends 
on our ability to break them down into relevant parts which can be 
measured. 

There arc a number of sources in which one can find detailed 
evaluation of scales of measurement extant in psychology (e.g., 6, 
aa), and no attempt will be made here to reproduce these discus- 
sions. Rather, I will simply indicate the diversity in the nature of 
scales in use and note a few points most pertinent for subsequent 
discourse. 

The crudest level of response measurement used in psychology is 
that of identifying responses as belonging to one of two mutually 
exclusive categories. The basic data, therefore, consist of the fre- 
quency of responses in each category. Thus, we might make a tally 
of the number of students who did pass a particular course and the 
number who did not. Or, we might count the number of people who 
have visited a psychiatrist and the number who have not. This is 
measurement in its crudest sense, but nevertheless it is fundamental 
to all more precise forms of measurement and has itself been used 
many, many times in psychological investigations. Now, while it 
might be apparent that research in psychology is not concerned 
primarily with such counting, as an end activity, it may be worth- 
while to make this explicit. I suppose that acquiring the knowledge 
that 85 per cent of the populace have never visited a psychiatrist 
while 15 per cent have has a certain significance in and of itself. But, 
a research psychologist would be interested in discovering in what 
other characteristics the two classes of individuals differ. That is, 
what are the correlates of visiting or not visiting a psychiatrist? Of 
course, in this illustration, one obvious difference would be expected, 
namely, mental illness. But, there may be a whole host of other fac- 
tors which are related, such as family background, Bnancial status, 
age, and so on. In short, defining a response measure (behavior) is 
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only an early step in research, and the two-category type of response 
measure is the crudest used. 

Now note that assigning otgamsms (as a result of their responses) 
to one of two classes may involve only a single response; indeed, 
in the simplest case they are assigned on the basis of the presence 
or absence of the response. Members put in the same class may dilfer 
markedly on many other responses. For example, to take an extreme 
case, assume that for some esoteric reason a research worker^ in 
speech wanted to assign organisms to one of two classes depending 
upon whether they can or cannot be taught to speak. Thus, people 
and parrots would fall in the same class. 

I have said that classifying responses into one of two categories 
is the crudest form of measurement. Once we go beyond two cate- 
gories we begin to introduce a refinement in our response measures, 
for the members of a given class become more homogeneous. For 
example, we might classify all responses on a Rorschach test as form 
responses and not form responses. But, we might go further and 
classify them in terms of color, form, movement, or none of these. 
As our categories become greater and greater in number the re- 
sponses in those categories become more and more homogeneous. If 
you like a name for this simple classifying-type measurement, the 
term nominal is commonly used (22). 

Let us turn next to the other extreme. If a response measure is 
recorded along a physical or ratio scale we have the most advanced 
form of measurement. Thus, length, weight, time, and so on, are 
measured along such scales. If reaction time is measured, it is in terms 
of a ratio scale; kilograms of work per unit of time reflects the use 
of two ratio scales. Such scales, it will be realized, have a true zero 
and theoretically can be broken down into an inflnite number of 
ec^ual units. 

In between these ttvo extremes of precision of measurement we 
have several gradations of precision. Let us note first that in the 
nominal scale (yes or no, is or ain’t, classification) we simply report 
the presence or absence of a response. Implicit in most of these classi- 
fications is the idea of magnitude of response. Suppose we classified 
adults as teachers or not teachers. In a certain sense, when we do 
this, we arc saying that those wc classify as teachers have a positive 
amount of “tcachemcss” and the other group has none. Our response 
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measure may make this idea of magnitude explicit and responses are 
rank ordered in terms of amount or magnitude of them. For ex- 
ample, -we could rank-order lo teachers as to their general teaching 
ability. This doesn’t tell us anything about how much better Teacher 
A is than Teacher B, nor B than C, but the idea of magnitude or 
amount of teaching ability becomes explicit by such a measuring 
instrument. As we advance further up toward the physical scale, 
we may acquire instruments which tvill not only rank order but 
which will also give us information concerning the magnitude of 
the differences between ranks. 

I will not pursue these matters further; as indicated earlier, excel- 
lent sources giving detailed differences among types of scales are 
available. There are, however, three additional points which I 
would make by way of setting up background for material to be 
covered later. 

1. As seen above, there arc certain responses which are studied 
by psychologists which can be described by physical or ratio scales. 
On the other hand, there are many responses for which no physical 
scales are appropriate. When a given response cannot be described 
by a physical scale, it must be measured by notations reflecting 
directly the discriminatory or perceptual responses of humans. If 
a characteristic of behavior can be described by perceptual responses 
in such a way that it is shown to vary systematically in amount, we 
have, I shall assert, demonstrated the existence of a psychological 
dimension and the instrument used to mirror this dimension is a 
psychological scale. The minimum requirement for establishing such 
a psychological dimension is two points; that is, Response A must be 
judged consistently to have a greater magnitude (or some other 
amount-like term) than Response B. 

2. The greater the number of useful units in our scale (whether 
physical or psychological) the better (more precise) will be our 
response prediction. By useful I mean units which will consistently 
reflect differences in behavior. This is a problem of reliability to 
which I will return more fully at a later point. 

3. Constructing psychological dimensions should not be thought 
of as merely being a useful technique for response measurement. 

In a general sense, the quantification of characteristics of objects 
or behavior via the human discriminatory response makes these char- 
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acteristics available for further research as manipulable stimulus 
conditions. On this point also I will speak at length in a later section 
of the chapter. 

RELIABILITY OF RESPONSE MEASUREMENT 

Once we have identified the dependent variable, the response, 
which we wish to investigate, and once we have determined that 
some level of quantification is feasible, the next step is ascertaining 
the reliability of the scale. Unfortunately, this is a matter which too 
many of us overlook and yet as far as the actual research is con- 
cerned, it is of the greatest importance. If our response measure is 
not reliable, no further investigative procedures should be under- 
taken. Science attempts to discover and understand reproducible 
phenomena; lack of reliability in our attempts at measurement pre- 
cludes this reproducibility. 

In the field of psychology I think those who construct tests have 
been far ahead of the strict experimentalists on this matter of reli- 
ability, the clinicians in general considerably behind. When a paper- 
and-pencil test is constructed, about the first thing the investigator 
does is determine its reliability, and the index is usually high for 
such materials. It may be that the investigator cannot find any other 
behavior which is correlated with the test behavior, but, by golly, 
he knows that whatever he is measuring he is measuring consistently. 
Other comments apropos to reliability may be incorporated in some 
illustrations. 

Too often we take reliability for granted; I think we are likely to 
do this most readily when some equipment or mechanical instru- 
ments are involved. Instruments seem to have a halo of precision 
abovit them which tend to make us take their lehahiiity for granted. 
But, in the current widespread use of electronic equipment, subject 
to continual breakdown, reliability must be checked repeatedly. Let 
me give you one illustration of the necessity of this. 

During the later part of World War II, Air Force psychologists 
developed a gunsight which was to be used as a test for selecting 
men who would have high probabilities of becoming good gunners. 
This sight, an actual replica of the sight used on the B-29 at that 
time, gained its face validity because the subject actually aimed and 
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reliable, we have only the first step in a research procedure, for 
certainly a reliable response measure is not very useful unless it can 
be shown to be related to something. But, in the above illustration, 
the low correlation bettveen the two indices of tension cannot be 
attributed to the low reliability of the basic response measure (the 
tension index derived from the written protocols) although it might 
be attributed to the low reliability of psychiatric judgments. 

As indicated earlier, considerable research effort goes into the 
dimensionalizing of characteristics of objects or symbols when these 
objects or symbols are to be used in subsequent investigations. 
There are many illustrations. An attitude scale may be constructed 
by having judges sort statements of opinion into piles along a de- 
fined continuum, the ends of which represent the extreme of the atti- 
tude involved, and the middle a neutral attitude. If a number of such 
items or statements can be shown to have high reliability, i.e., if the 
judges agree on the ^‘degreeness" of attitude implied by the written 
statement, and if the dimension is well represented by statements 
having high reliability, the scale can then be used as a response- 
measuring instrument, say, in investigating conditions which might 
change the attitude. In verbal learning studies it is often necessary to 
dimensionalizc certain characteristics of the material before pro- 
ceeding with experimental work. Such characteristics as similarity, 
meaningfulness, familiarity, and affectivity have been dimension- 
alized by judges, and if reliability of the judgments is obtained the 
material can be used in subsequent research to discover the effect of 
the characteristics on various learning phenomena. While most of 
the scales have been constructed for work with human subjects, it 
is quite feasible to carry out operations whereby lower animals 
serve as “judges.” For example, Harlow and Meyer (/^) dimension- 
alized attractiveness of five different foods for monkeys by a paired- 
comparison technique. Knowing the value (to the monkeys) of each 
food, these foods can then be used in subsequent investigations to 
determine the effect of the values on certain learning behavior. 

We need not pyramid the illustrations. In all cases, to repeat, the 
initial usefulness of the response measures depends upon the reli- 
ability of these measures. 

What is acceptable reliability? It is by now quite apparent, I think, 
that in my opinion the measuring of response reliability is manda- 
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they evoke must be handled with the same attention to measurement 
problems as any other instrument. From these tools the raw data 
derived consist of written protocols. The investigator must then 
categorize what is believed to be relevant remarks or ideas in the 
protocols. If he is reliable within himself in this categorizing and if 
this is not an artifact (as it may well be), the response measure has 
usefulness. However, for research purposes, the use of several judges 
showing interjudge reliability is to be preferred. 

The use of written documents to obtain response measures be- 
lieved to be reflections of important aspects of behavior is somewhat 
on the increase. Since it is a somewhat unusual method of securing 
measurements of behavior, an illustration will be given. In 1947 
Dollard and Mowrer (7) published a technique for deriving an 
index of tension from written documents. Essentially the technique 
consisted of analyzing a passage written by a patient by counting 
the number of clauses or phrases which to these investigators implied 
high tension or anxiety and the number which implied low tension 
or anxiety. A ratio between these two measures was used as an over- 
all index of anxiety. In this particular article the authors were not 
concerned with what this tension measure might be related to; they 
were interested only in presenting the method and in showing that 
a reliable response measure could be derived. This they did. Inter- 
rrelations among 10 independent scorers were quite high, much to 
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ciation value of nonsense ^llables. Even with a fairly limited number 
of responses for each subject the reliability was .79. One of my col- 
leagues {8) built an apparatus a few years ago which required simul- 
taneous activity of both arms in adjusting two levers. Although the 
mechanics of this apparatus were fairly complex, the odd-even trial 
correlation over 20 trials was .97. So, I would say, that while we 
must insist on reliability indices for new response measures, only 
rarely will they be low if the investigator has any “feel” at all for 
the factors in the situation which might introduce extreme intra- 
subject variability. 

MULTIPLE RESPONSE MEASURES 

There are four matters of widely varying importance regarding 
multiple response measures which need discussion. These four arc; 
(a) the situation in which a research procedure yields nvo or more 
response measures which are highly correlated; (b) the situation 
which produces two or more response measures which are poorly 
correlated; (c) response correlation as an independent technique of 
research, and (d) response correlation and causality. 

Multiple respome tneastires highly correlated. There arc many 
research situations in which the Investigator records more than one 
response. For example, in learning studies it is quite common to 
record trials to learn, errors, and perhaps some other measure, such 
as latencies. Given this type of situation, we arc concerned at this 
point only with the case where these response measures are highly 
correlated. Actually, little need be sai^ about this. If two or more 
response measures arc highly correlated, ob^dously we can use them 
to define a single phenomenon (some might prefer to say that wc 
would infer a single process). In fact, if we have shown that u-c have 
two or more such measures, it becomes a rather redundant procedure 
to continue recording both in subsequent invesrigations. could 
instead use any one of the measures with liigh confidence that wc 
arc measuring a single phenomenon. Which one wc use becomes 
largely a m.ittcr of personal choice. If the recording of one response 
involves an c.xpcnsivc piece of apparatus and the other doesn’t, our 
choice is made (unless wc have a government research contract); in 
general wc would choose the one which is most economical and 



25 Psychological Research 

tory. It is, therefore, quite a legitimate question to ask when a 
response measure is reliable and when it is not. How large must a 
reliability coefficient be before the response measure can be accepted 
as a useful index of behavior? Unfortunately, a categorical answer to 
such a question is impossible. When a paper and pencil test is de- 
veloped there are likely to be lifted eyebrows if the correlation co- 
efficient is not at least .80 or more. But, we are all aware that the 
numerical value of the correlation is affected by several factors over 
which we have little control. For example, it is common knowledge 
that the use of a very homogeneous population will usually reduce 
the correlation as compared with a heterogeneous population. Split- 
half versus a test-retest technique is another matter affecting the cor- 
relation. Also, we knotv that if the response index has a very limited 
range imposed by the nature of the task the numerical reliabilities 
will be low. For example, in verbal learning studies, interday reli- 
abilities of recall may run no higher than .50 and may be as low as 
.20. This would hardly seem to be satisfactory as a response measure 
even though these values differ significantly from zero. However, if 
one examines the situation it is discovered that the range of scores 
possible on such recall tests is so limited that individual differences 
cannot be fully reflected. That is, there may be 10 possible items to 
recall but because of the particular conditions of the experiment the 
total range recalled may vary from, say, 3 to 8. It is nearly impossible 
statistically to produce high coefficients of reliability from such 
data. The reliability coefficient of such response measures must be 
supplemented by the lawfulness of results which can be produced 
from experiment to experiment. 

This is enough on response reliability. When a new response meas- 
ure is used, or when an old one is markedly modified, we must make 
it common practice to derive an index of reliability. When the reli- 
ability is established, and only then, can tystematic research be 
undertaken. Perhaps I have made too much of this issue. Actually, 
for the usual type of response measure, the reliability is likely to be 
high. And, I might say that there is something quite comfoning in 
devising a new test or task and finding the reliability of the perform- 
ance on this task to be very high. One of my students (/y) recently 
worked out a technique to measure the associative capacity of sub- 
jects, a technique quite similar to that used in determining the asso- 
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measures have been used to infer strength of conditioning, such as 
amplitude, latency, frequency, and resistance to extinction. Various 
experimenters have used these measures interchangeably without 
first determining that they were highly correlated, hence, equally 
satisfactory to reflect the amount or strength of conditioning. And 
correlations benveen some of these response measures may be very 
low (3, 12). 

What position are we in when we use two response measures to 
infer a single phenomenon when these response measures are poorly 
correlated? It is primarily a definitional or conceptual problem. It is 
a definitional or conceptual problem because if tve hav'e ttvo poorly 
correlated response measures we actually must have two fairly inde- 
pendent phenomena and these should be so defined. Failure to do so 
may have at least two important comphcations. Not only can theo- 
retical systems based on wanton interchange of such measures be 
quite misleading, but also I suspect that a number of so-called em- 
pirical contradictions in the literature may be a consequence of the 
fact that different response measures were used by different investi- 
gators and these measures were not highly correlated. 

Response correlation as a tool of research. Response correlation 
may take at least three different forms as a complete tool of research. 
These forms are not independent, but I wish to mention them separ- 
ately since they emphasize somewhat different rationales. 

1. Simple test validation. Whether a test be a paper-and-pencil 
test, a performance test, or 2 projective test, the aim is that of pre- 
dicting the behavior of the individual in situations other than the 
test situation. An investigator (for example) constructs a test xvhich 
he believes will pick out potenoally good supervisors from poten- 
tially poor supervisors. To get an index of validity he correlates test 
performance with subsequent supervkoiy performance. Or, an army 
psychologist, interested in predicting marksmanship performance, 
might correlate steadiness and marksmanship to see if the nvo are 
stemming in part, at least, from a common process or processes. As 
we all well know, predicting complex performance such as super- 
visory success or vocational success does not come easily. However, 
for our purposes the success or failure of such ventures is not 
particularly relevant. The germane point is the intenr of the investi- 
gator in using response correlation- IVhat he attempts to do is, by 
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practical to work with, reliabilities being equal. Nothing prevents 
us from continuing to record both measures, but little is gained by it. 

I will give just one actual illustration of what I mean by high cor- 
relation between multiple response measures. Marquis was in- 
terested in measuring reactions to frustration shown by newborn 
infants. The infants were observed for lo days in the hospital imme- 
diately after birth. To attempt to produce frustration, the infant 
was allowed to have a bottle of milk for a short period of time, then 
the bottle was withdrawn, given again, withdrawn, etc. During the 
intervals between the short feeding periods, five different responses 
were recorded: (a) amount of mouth activity; (b) amount of gen- 
eral bodily activity; (c) frequency of crying; (d) latency of mouth 
activity; and (e) latency of general bodily activity. The lowest 
inter-correlation among these various measures was .80, even with 
a very small number of subjects. It seems apparent, that any one of 
these could be used as an index of frustration without fear that sig- 
nificant data were being lost by not recording the others. 

Multiple response measures poorly correlated. In some research 
situations several response measures are recorded which arc not 
highly correlated. Thus, responses to a Rorschach inkblot are cate- 
gorized as movement responses, form responses, and so on. Presum- 
ably, these categories have low intercorrelations. This means that the 
investigator is simultaneously measuring different phenomena (or 
processes, if you prefer); a high frequency of movement response 
ostensibly means quite a different thing from a high frequency of 
form responses. In verbal learning experiments, rate of learning and 
frequency of overt errors have no relationship-the correlation is 
zero. Apparently, different mechanisms or processes are involved in 
the production of the two response measures. Actually, no imme- 
diate problem is evident if it is clearly shown that the response 
measures from a given situation have low intercorrelations and if 
the experimenter, therefore, concludes that he is dealing with differ- 
ent phenomena. 

A problem with low intercorrelations among response measures 
may arise, however, if the response measures are obtained in different 
experiments and if an investigator uses them to infer the same 
process or define a single phenomenon. This problem has become a 
real one in the case of certain conditioning data. Several different 
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in rote learning. Both response measures were shown to be reliable, 
but no relationship was evident between amount of childhood 
punishment and error frequency. Thus goeth much research. 

3. Factor analysis. The most refined and grandiose method of 
using response correlation as a basic instrument of research is that 
method known as factor analysis. Briefly, a group of subjects is 
given a large number of tests, say 20 or 30 which are presumed to 
sample all the various skills or capacities in a given domain, e.g., per- 
ceptual skills or motor skills or creative activity. Usually the tests 
are selected only following long periods of study of the various 
kinds of behavior believed to be involved in the domain selected for 
investigation. After the scores are obtained, correlations are calcul- 
ated between each test and every other test; then, to speak 
facetiously, axes are rotated, matrices are matricized, variances are 
variated, and vectors are vectorized. The purpose of this labor is to 
find out groups of tests which correlate highly with each other but 
not with other groups. Such a group of tests is therefore presumed 
’CO be measuring pretty much the same capacity or trait (factor) and 
will usually be given a name. As a consequence of such analyses it 
may be found that most of the variance can be accounted for by 
perhaps 4 or d essential factors. On subsequent research or in actual 
selection procedures only tests which are relatively purified for these 
factors may be used. We can see, therefore, that factor analysis 
results in an economy in that it identifies the skills which arc impor- 
tant in a given area (domain) of behavior so that on subsequent 
research the testing becomes very limited. 

There can be no doubt about the general usefulness of factor 
analysis as a descriptive tool. For the particular battery of tests used, 
the resulting factors define the capacities or skills involv’ed. In a real 
sense it results in subject skills being defined by tests so that these 
skills have the status of a construct representing a broad area of 
capacity. If we then so u'ished on subsequent research we could 
manipulate these subject skills. To this matter I will return later. 

The deficiencies of factor analysis (aside from any mathematical 
or statistical questions) lie not in the program of research outlined 
by eminent factor anaij'sts, but in the failure thus far to carry out 
such a program. Thus it is assened that once factors have been de- 
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abbreviated observations of behavior, predict behavior in a distinctly 
different situation. He may or may not be interested in the processes 
or skills as such which lie back of the performance; his interest may 
lie only in practical prediction. 

2. Personality correlates of nonclinical behavior. I can best tell 
you what I mean to include under this rather inept title by giving 
you an illustration of the research which typifies what I am think- 
ing about. It is represented in two theses done in our laboratory. 
An experimenter who has observed many subjects learn lists of 
words by rote cannot help but become curious about the personality 
variables or traits involved in such learning. It is a matter of fact 
that very little research has been done on this problem. By person- 
ality variables I am referring here to broad and rather loosely identi- 
fied traits such as dominance or introversion. In rote learning, a 
phenomenon which attracted our attention as a possible personality 
indicator was number of overt errors. Some subjects make many 
errors, others very few, and there is no relationship between error 
frequency and rate of learning. It, therefore, seems quite conceivable 
that error races reflect personality differences which are relatively 
independent of learning ability. More than that; one can generate 
specific hypotheses about errors and personality traits. For example, 
a subject who makes very few errors might be suspected of having 
a history in which he was severely punished (at home or school) for 
making errors. He might, therefore, have developed a generalized 
trait of caution against making responses unless he was fairly sure 
they would be correct. A person brought up in a loose disciplinary 
environment might, on the other hand, be relatively unconcerned 
about his errors. 

Xo see if it were possible to find personality variables or traits 
related to error-making, Elkin (5>) had 125 subjects learn by rote a 
rather difficult list of adjectives, and also gave them the Minnesota 
Multiphasic Personality test. The question was simple: will any of 
the various scales of the MMPI correlate with error frequency? 
None did. 

In a second study by Singer (.2/) an attempt was made to test the 
specific hypothesis relating punishment in the subject’s past history 
to error-making. By a questionnaire. Singer got information on the 
subject’s history of punishment and also recorded error frequency 
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CoTrelation and causality. There is no agreement in science on 
■what the appropriate use of the term c/nise really is. Some scientists 
even refuse the use of the term and prefer instead to speak merely of 
relationships. While I can appreciate the uneasiness attending the 
use of the word, it has a certain communication value and so I will 
not avoid it. For the time being I shall simply point out some diffi- 
culties which arise when the word is used in connection with inter- 
preting the results of correlation studies. My initial point (one which 
has been made by many other writers) will be that inferring cau- 
sality from simple correlations is an extremely dangerous pastime. 
Some rather extreme illustrations of this will be cited and then I 
will qualify the conclusion somewhat by looking at the way certain 
factor analysts conceptualize the problem. 

If we find that Form L and Form M of the Stanford-Binet Intel- 
ligence Test correlate .95 , 1 am sure that no one would conclude that 
the behavior observed on one form caused the behavior on the 
other. We might be willing to say that some hypothetical capacity 
or skill (intelligence) was measured about equally by both tests and 
that this capacity or skill is the immediate source (cause) of the 
observed correlation. 

Suppose we notice that there is a high correlation between the 
number of people wearing raincoats and the amount of water in 
the storm sewers. We would not say that because people wore rain- 
coats the amount of water in the sewers increased; nor would we say 
that the great amount of water in the sewers caused the people to 
wear raincoats. Obviously there is some other factor which is re- 
sponsible both for the raincoats and the water in the sewers. 

Some practitioners of factor analysis have found it easy and useful 
to think of their factors as causes. Thus, if a boy has high numerical 
ability as shown by test scores, and if he gets a high grade in a course 
in arithmetic, there is a tendency to think of the high grade as being 
caused by high number ability. Discussions by both Cattell (,4) and 
Eysenck (/o) make it clear that basically they would like to impute 
causal status to factor-analytic factors but cannot do so with con- 
fidence until they find independent conditions which change or 
vary the amount of the factors involved. Thus, if we vary amount 
of a certain hormone and find that a given factor changes in astwimt 
or magnitude, the factor in question can be given causal status in 
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rived, major research into the variables affecting the factors can 
be undertaken. 

. . . the rough factorial map of a new domain will enable us to proceed 
beyond the exploratory factorial stage to the more direct forms of 
psychological experimentation in die laboratory { 23 , p. 56). 

But, the facts of the case are that this more direct psychological 
experimentation has been largely programmatic. Only recently have 
we had large scale attempts to get “behind” the factors and find out 
what variables influence them- For example, Cattell (4) is under- 
taking studies on 1,000 children in an effort to determine the in- 
fluence of genetic and environmental factors on the capacities of 
these subjects. Such studies as this must be carried out if factor 
analysis is to reach a stature whereby it allows an understanding of 
behavior over and above economical description. Thus, factor 
analysis first defines the imponant responses by cross-sectional 
analysis and then subsequent longitudinal studies may aim at dis- 
covering causal factors affecting these responses. No one is going td 
assert that those using factor analysis are laggards; Cattell reports 
that some 4,000 tests have been used to explore personality (4). 

Although I might seem somewhat critical of factor analysis as 
practiced thus far, I would quickly add that those of us not primarily 
interested in it as a technique might make better use of it than we 
do. For example in the field of learning, even a restricted area such 
as rote learning, we do not know the relationships among perform- 
ances on various tasks. If we could have a grand factor analysis of a 
large number of rote learning tasks we could define the essential 
skills involved. Furthermore, we could then choose tasks which are 
relatively pure on these skills for subsequent experimentation. And 
then, if we manipulate a given condition for these representative 
tasks we can make statements about the universality or lack of uni- 
versality of the resulting relationship. At the present time, for ex- 
ample, if we determine the influence of a variable on paired-associate 
learning we haven’t any sound basis for generalizing, say, to maze 
learning, for we do not know how the skills necessary for these two 
tasks are related. In short, factor analysis, in my opinion, still has a 
large part to play in many areas of research, not simply in determin- 
ing personality traits. 
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crass a fashion. That is, usually there will be at least a crude hypoth- 
esis which determines the choice of the stimulus condition to vary 
for a particular experiment. Nevertheless, the question describes the 
essential condition of research which is stimulus-oriented. 

In my way of thinking, low-Ievcl cause-and-effect relationships 
appear with sharpness when we consider stimulus-oriented research. 
If we run a carefully controlled conditioning study in which the 
intensity of the conditioned stimulus is systematically varied, and if 
we find a related change in acquisition of the conditioned response, 

I shall say that changes in the intensity of the conditioned stimulus 
caused the change in behavior. If one wishes he might think of this 
as apparent cause, thus making explicit the recognition that there are 
mediating mechanisms (physiological mechanisms) which are the 
more immediate cause. Indeed, if one wishes to pursue the matter 
further he can reduce it to specific nervous function, or to chemical 
reactions, or whatever level of explanation one desires and is capable 
of justifying. Or, certain theorists may postulate hypothetical proc- 
esses which are related to the stimulus manipulation and these may 
be thought of as the cause. But, at the sheer empirical level, at the 
level of analysis of the experiment, the manipulated condition is as 
true a cause as we can possibly have. Empirical laws between stimu- 
lus variables and response measures are the basic facts from which 
more elaborate cause-and-effect chains start. 

I would like to make two other preliminary comments. First, 
while the problem of experimental design will be taken up in later 
chapters, I think it well to note the basic design problem present in 
all srimulus-oriented research. Obviously, if we are going to vary a 
given stimulus condition, and observe changes in behavior, the essen- 
tial dictum is that only one such condition (be it a very simple or 
a very complex condition) should be allowed to vary systematically. 
This is comihonly said to be holding all conditions constant except 
one. Some comments have appeared in recent literature which imply 
that this basic principle of experimental procedure is outmoded. This 
is not true. One may vary more than one stimulus condition in a 
given experiment (multivariate designs) and it is very efficient to 
do so. But to draw a conclusion about the influence of any given 
variable, that variable must have been systematically manipulated 
alone somewhere in the design. Notiung in analysis of variance, co- 
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the same sense that any other construct is given such status. While 
it might seem redundant to think of both the hormone and the factor 
as causing the change in behavior in a one-to-one relationship, we 
shall see in later chapters that many psychologists find it convenient 
to do so. 

I think I must make it clear that I am not saying that correlation 
never means causality^ 1 am simply saying that we must be very 
thoughtful about the matter before reaching such a conclusion. I 
think it is fair to say, albeit a relatively meaningless statement, that 
response correlations allow us (if we wish) to infer some common 
causal condition (process, state, capacity) . It is meaningless because 
we can infer the existence of such a hypothetical process with only 
a single response measure; we don’t need a correlation. The situa- 
tion is not amenable to cause-and-effect analysis until we can show 
how certain independent conditions will change the amount of a 
given factor as inferred from changes in scores on tests from which 
in turn the factor was inferred. 

In actual practice such inferences are usually made following 
some form of stimulus analysis and manipulation, a topic to which I 
now turn. 


STIMULUS ANALYSIS 

We have noted above that the essential rationale of factor analysis 
is to infer certain basic capacities of the organism as a result of re- 
sponse correlations. The stimuli involved in this situation are the 
tests— the original battery of tests, the scores on which provide the 
raw data from which in turn the factors are extracted. These tests 
are selected because it is believed by the investigator that they will 
tap all basic capacities in a given domain. In the usual sense of the 
word, there is no single continuum along which the tests are ordered 
before testing begins. The research is clearly response-oriented. 
Stimulus-oriented research, on the other hand, has as its basic premise 
the manipulation of a specified stimulus characteristic and the deter- 
mination of change in behavior associated with the change in the 
stimulus. Put simply, the question is asked: “What variable stimulus 
conditions, when filtered through the organism, produce systematic 
changes in behavior?” This, of course, is the baldest type of empiri- 
cal question, and probably very few research workers operate in so 
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a particular feature of the environment; the investigator does not 
choose one randomly. Several illustrations will show that this is a 
most common type of research. 

1. Intelligibility of speech as a function of background noise. In 
such a study the investigator seeks a relationship between the intel- 
ligibility of transmitted speech and the intensity or complexity of 
background noise. 

2. Alpha rhythm as a function of flash duration. Here the ex- 
perimenter exposes the subject's eye to varying durations of light 
flashes and measures the attendant alpha rhythm of the occipital 
lobe. 

3. Forgetting as a function of length of retention interval. Varia- 
tions in time is one of the most common stimulus manipulations and 
occurs in nearly every area of psychological investigation. 

4. Reading speed as a function of intensity of illumination. Work 
output as a function of type of music being played in the factory. 
Intelligence quotient as a function of nature of early environment. 
And on, and on. There is almost no end to the number of potential 
variables which constitute our environment. I want to consider just 
one more case, which is experimentally the same type of bald rela- 
tionships suggested above, but which conceptually should perhaps be 
kept distinct. 

5. Variations or manipulations in features of the environment are 
often conceptually related to changes in hypothetical processes in 
the organism. Variations which arc produced in motivation by 
varying the amount of reward provide an illustration. Here the in- 
vestigator, by manipulating the amount of food or amount of money, 
produces, or hopes to produce, changes in a process or state which 
he calls motivation. These changes may in turn be related to per- 
formance on a standard task. Now actually, any of the previously 
given illustrations could be conceptualized in the same way if the 
experimenter were so inclined. Thus, we might postulate neural 
blockage of some kind as being the intermediary between differences 
in flash duration and the alpha rhythm. It should be clear, therefore, 
that the operations involved in all of the above cases are basically the 
same regardless of how the experimenter may conceptualize his 
particular problem. I merely note this issue here for I will return 
to a full consideration of it in later chapters. 
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variance, latin squares, Greco-Latin squares, or Greco-Arabic-Latin 
squares has abrogated the basic principle. These powerful designs 
and statistical tools may save wounded experiments, and they provide 
remarkable levers for extracting variances, but in actual operation 
there are no laws resulting from their use which obviate the neces- 
sity of holding all factors constant except one if we expect to con- 
clude anything about the effects of the factor. 

A second preliminary point concerns the handling of variables 
whose influence is unknown. Suppose we want to study the effects 
of variable A; what do we do about potential variables B, C, D, etc.? 
There have been statements pertaining to this situation which have 
a certain amount of nonsense in them: 

... the precise testing of a hypothesis generally presumes that one 
knows the relevant variables in the area of investigation, since without 
this knowledge it becomes difficult to establish adequate experimental 
controls. In such a case, an exploratory or formulative study is more 
likely to be fruitful than an experimental study (z^, p. 29). 

If this statement is taken at face value we would still be doing 
exploratory or formulative studies in all areas, for who can say 
when we know what all relevant variables are. Sound and precise 
experimental research does not hinge on our knowing what the 
relevant variables are; for the moment I ask you to accept this state- 
ment, as its defense and elaboration will not come up until later. 

The following discussion treats of two kinds of stimulus analysis, 
one in which there is active stimulus manipulation and one in which 
natural variation occurs and conclusions are drawn on the basis of 
post-hoc statistical control. The first will be broken into three sec- 
tions depending on the nature of the variables being manipulated. 

ACTIVE STIMULUS MANIPULATION 

Environmental variables. In these experiments the investigator 
chooses some feature of the environment which is capable of some 
form of quantification (as discussed in connection with response 
measurement) and his conditions consist of different amounts (per- 
haps only “qualitative” differences) of this feature. As mentioned 
previously, past results or theory usually dictate the investigation of 
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task that was being tried out for subsequent experimental use. These 
two types of instructions might be expected to produce differences 
in performance on the task. 

2. In a rote-learning task, what is the influence of instructing one 
group of subjects to guess and another group not to guess? 

3. In studying performance on a paper-and~pencil test, what is 
the influence of a “speed set” versus an “accuracy set?” 

4. In a social-psychology experiment, what is the influence of tell- 
ing one group of freshmen that their leader is a senior while another 
group is told that the same person is an instructor in the department? 

NAIUKAL VARIATION WITH STATISTICAL CONTROL 

In this method of research, no active stimulus manipulation is 
involved. Record keeping is a fetish in our social order, both in 
governmental institutions and private institutions. It, therefore, be- 
comes theoretically possible to go back to records of individuals and 
try to find factors which are related to differences in behavior which 
have been noted. There are at least two different ways by which this 
has been worked out. First, different individuals may have actually 
been treated differently in some way. The investigator now attempts 
to search the records to see if the behavior differed as a consequence 
of the treatments. Secondly, differences in behavior of individuals 
may have been noted, and the investigator goes back to the records 
to see if there is one or more factors which might account for the 
differences in behavior. Again, let us look at some illustrations. 

1. For many years at the University of Minnesota a student 
counseling program has been carried on. Eventually someone began 
to wonder if this program was worthwhile. In an attempt to answer 
this question the investigator went back to the records, took a group 
that had been counseled and a group which had not been counseled 
and then made comparisons of subsequent scholastic achievcmenc. 

2. In an investigation of a sociological nature (/) the investigator 
tried to answer the question as to whether or not participation in a 
Boy Scout program contributed to better community adjustment. 
His procedure was to go back to available records, obtain a group 
that had had several years of scout work and a group that had had 
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Task variables. In all of the above illustrations the manipulated 
condition was extrinsic to the particular task on which the subject 
was measured. In the case of task variables, some particular charac- 
teristic of the task itself is varied and resulting changes in behavior 
noted. Again, a number of illustrations seem to be the best way to 
give a picture of this kind of stimulus manipulation. 

1. Rorschach responses as a function of the color or shading of 
the blots. Here, the actual task eliciting the behavior is changed in 
a certain way and observations are made of the differences in be- 
havior which result. 

2. Rote learning as a function of ipcaningfulness of the material. 
The classical illustration is variation along a scale of meaningfulness 
of nonsense syllables. Does rate of acquisition of the syllables vary 
as a function of different levels of meaningfulness? 

3. Rate of acquisition of the pursuit-rotor skill as related to speed 
of rotation of the target. 

4. Test performance as a function of the number of alternative 
choices allowed on a multiple-choice type quiz. Or, twt perform- 
ance as a function of similarity of the various wrong choices to the 
correct choice. 

Instructional variables. While this form of stimulus manipulation 
might possibly be conceptualized as either environmental or task 
manipulation, I think it best to list it independently. In this type of 
research we attempt to vary the behavior of the subject by varying 
what we tell him about the task he is going to work on, or what the 
implication of his performance is, or how he should attack the prob- 
lem, and so on. There is almost no limit to the number of possible 
variations in instructions, although in actual practice not a great 
many have been investigated. In general, the intent of varying in- 
structions is to change the subject’s perception or evaluation of the 
situation and to determine whether or not such changes are related 
to his performance. One might also prefer to list this type of experi- 
mentation under subject manipulation (to be discussed shortly) ; but, 
let us look at some illustrations to see the nature of the manipulation 
•without too much concern for the niceties of classification. 

I. Learning as a function of ego-involvement. We might give one 
group of subjects a learning task and tell them this task is actually a 
measure of intelligence, another group being told that it is a simple 
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However, it is when a factor responsible for the differences is sup- 
posed to have been found that one must look at the procedure with 
great care before accepting the findings. 

QUANTIFICATION OF STIMULUS DIMENSIONS 

The discussion under this heading can be quite brief since the 
points that we have made concerning quantification of responses in 
general apply here also. We have noted that the more precise the 
quantification of the response, the more precise the prediction which 
can be made. In the same fashion, the more precise the quantifica- 
tion of our stimulus dimension, the greater the precision in our 
laws resulting from their manipulation. 

The stimulus dimension may be described along a physical scale 
or along a psychological scale. We have discussed these scales pre- 
viously and have seen how the construction of psychological scales 
for characteristics of objects or symbols often precedes actual stimu- 
lus manipulation of the characteristic. I cannot emphasize too 
strongly the importance of the classical psychophysical methods 
(and derivatives from them) as techniques for dimensionalizing 
stimulus characteristics. Paired-comparisons, rank order, single stim- 
uli, or even the methods of constant stimuli and average error can be 
adapted to these problems. AH are powerful and extraordinarily use- 
ful tools for dimensionalizing characteristics of materials for which 
no physical scale is appropriate. All psychological dimensions used 
as stimulus variables eventuate from the reliable scaling of response. 

Notr, tt'hile tre mast condime to hold up precision of measure- 
ment as a goal tov^ard which we continually work for all of our 
stimulus and response variables, we must also keep clearly in mind 
the fact that research with a certain limited usefulness can be done 
uoth extremely crude quantification of the stimulus. In the first 
place we may have coarse stimulus dimensions in which quantitative 
differences are expressed entirely in terms of words. For example, 
suppose we wanted to measure Thematic Apperception Test re- 
sponses as a function of amount of trauma depicted by the cards. 
Judges might sort the cards into three piles representing high, me- 
dium, and low trauma, and, if this could be done reliably, we could 
proceed with the experiment. But we can be much more crude than 
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but little such work, and then measure differences in community 
adjustment. 

3. At one time when I was in the Navy, and time was particularly 
heavy on my hands, I decided to test the hypothesis that boys raised 
on a farm had better mechanical aptitude than those raised in a city. 
Records were available which allowed me to test this hypothesis; we 
had records of whether or not the boys had lived on farms for an 
appreciable.length of time and we also had their mechanical aptitude 
test scores. 

4. In records of members of the armed forces we could find a 
group representing individuals who were discharged during the war 
for nonphysical reasons and a group \Vhich was not so discharged. 
We might then analyze other data in the records and see if we can 
isolate predischarge differences in these two groups. 

5. We could divide married couples into nvo groups; those who 
have been divorced and those who have not b^een divorced. By 
going back into the history of these two types of cases we might 
discover a factor or factors which would seem to be related to the 
response measure (divorced or not divorced). 

6. As a matter of fact, such problems can be worked out in the 
more staid experimental situation. One could keep records of various 
personal attributes of subjects as they serve in an experimental situa- 
tion. We might then at a later time attempt to discover if any of 
these factors appear to be related to responses recorded in the ex- 
perimental situation. 

I think it can be seen that the number of problems which might 
be approached by this method is nearly inexhaustible. It is a fact, 
however, that not a great deal of research of this kind is undertaken, 
probably because some of the "design” problems involved are nearly 
insurmountable. There ace published reports of research using these 
techniques which are utter farces as far as methodology is concerned. 
I shall later expose you to details of some of these investigations so 
that you can evaluate them ior yourself. Nevertheless, the rationale 
of these studies is the same as a cross-sectional type of experiment 
using active stimulus manipulation. The basic idea is to have some 
measure of behavior, and then try to narrow causative factors down 
to one. Failure to find a factor which will account for observed 
differences Is not serious and at least in a negative sense worthwhile. 
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is shown that with a particular age group learning is more rapid for 
material presented visually than for material presented aurally, train- 
ing situations could well make use of this fact. But, it should be clear 
that from the analytical view of science as presented here, showing 
such a difference would merely establish a phenomenon on which 
further analytical worlt is required in order fo derive speddc causal 
relationships. 


UNITARY AND COMPLEX DIMENSIONS 

Stimulus dimensions may be unitary or they may be complex. By 
a unitary dimension I mean one in which only a single discernible 
characteristic is reflected. By a complex dimension I mean one in 
which two or more unitary dimensions combine to form one 
descriptive dimension. I do not wish to restrict complex dimensions 
to two levels, for, as we shall see, several combining levels may be 
apparent. 

Our simple physical scales are intended to measure relatively 
unitary characteristics. Frequency of sound wave will be said to be 
unitary as measured in cycles per second. Many, if not most, psycho- 
logical scales are complex. This is said with some evasiveness for a 
reason which will become apparent shortly. The dimensionalized 
characteristic is constituted of subsidiary dimensions which combine 
to make up the characteristic actually scaled. 

I have said that the success of dimensionalizing (with other than 
physical scale) of any characteristic of behavior or the character- 
istics of objects rests on the reliability of the human discriminate^ 
response, “l^at is, to become repugnantly repetitious, we must have 
reliability in our measuring instrument. Now, it is quite possible to 
scale a complex dimension reliably. I suspect that any attitude that 
is scaled represents the composite of several subsidiary dimensions. 
For example, attitude toward sodalized medicine could be dimen- 
sionalized along a single complex dimension. This dimension results 
from some sort of summation of subsidiary dimensions, such as, say, 
attitude toward bureaucracy in general, scare of health, financial 
status, and so on. It seems evident that in order to scale a complex 
dimension reliably, the relevant subdimensions must vary in some 
systematic fashion with each other. There are a number of reasons 
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this. Let us assume that we wanted to find out whether auditory 
presentation resulted in faster learning than visual presentation of 
material. We could clearly specify two such different conditions and 
the research could be carried out simply, but about all we could 
say when we finished is that there arc differences or no differences 
in learning as a function of mode of presentation. We could not, 
with any assurance, state on what particular dimensions auditory 
and visual presentation differj one simply uses the visual system and 
the other the auditory system. 

Let us looh at another illustration. In a matlcct research study we 
might want to determine which type of furniture, modem or tradi- 
tional, was more preferred by a representative group of housewives. 
Probably, we could discover that there are very strong biases on 
such matters, but to relate these biases to the particular character- 
istics common to both traditional and modern furniture would be 
an extremely tedious, and perhaps impossible, job. The major impli- 
cations of these illustrations is that they make apparent that in such 
crude research the ability to infer cause-effect relationship is 
seriously restricted, for we cannot state on common dimensions 
all the ways in which such complex stimuli differ. This problem, 
that is, the problem of what may be called unitary versus complex 
dimensions, is an important one to which I will give more attention 
shortly. First, however, I wish to make two other comments con- 
cerning research in which stimulus differences are qualitative. 

In research dealing with these qualitative differences the design of 
the experiment can be perfectly sound; the limitation lies in the 
nature of the question such experiments can answer. Specific cause- 
effect statements cannot be made in the sense that we have discussed 
these statements earlier. Yet, such research may have considerable 
value. One of the initial tasks of a science is to establish reliable 
phenomena with which to deal. These experiments in which qualita- 
tive differences are used may at least establish whether or not there 
is a phenomenon. If so established, subsequent analytical research 
can be undertaken in an effort to discover the particular dimension 
or dimensions in the stimulus condition which are responsible for the 
phenomenon. So, from a strict scientific point of view, such experi- 
ments may have at best only a mapping function. On the other hand, 
these experiments may have considerable practical value. Thus, if it 
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present the details of the procedures used, but those interested may 
refer to several contemporary papers (/, i8j 20 ). 

To remove the above discussion somewhat from the abstract 
level, I will discuss an illustration which brings the issues down to a 
research level. This illustration actually involves characteristics of 
the subject per se, a topic to be considered in the next section. I will 
anticipate this section in this illustration because it is especially 
suited to emphasize the points I am making. 

Let us imagine that we wanted to carry out an experiment on the 
relationship between degree of adjustment of college students and 
critical flicker fusion frequency- 1 feel quite sure that able diagnos- 
ticians (ivhether psychiatrists or psychologists) could “sort” a large 
group of college students into a minimum of three groups, one being 
a poorly-adjusted category, another well-adjusted, and the third 
in-benveen. As usual, we would insist on reliability of the sorting. 
Having done this, we have scaled a dimension of adjustment. But 
now, look at the characteristics of behavior which must have been 
considered by our judges in arriving at a decision for the placement 
of an individual in one of the three categories. One major dimension 
which a judge might use could be called “amount of anxiety.” But, 
the judge doesn't observe anxiety directly; rather, he observes other 
lower-order characteristics in order to make a judgment about 
amount of anxiety. For example, he might inquire into frequency of 
stomach disorders, dream content, fingernail biting, and other behav- 
iors which he believes to be indicators of anxiety. The amount of 
each of these would then be “summed” to get a judgment of anxiety. 
Then, there would be other major subdimensions of degree of ad- 
justment, such as amount of withdrawal. A sum of these major sub- 
dimensions, the amount of which was in turn determined by 
summing more unitary dimensions, becomes his final index of degree 
of adjustment. The diagnostician may, of course, use certain tests 
to aid in establishing the amount of a given characteristic, but it is 
quite clear that a great deal of a kind of mental “factor analysis” 
is involved in considering the importance of certain characteristics 
for the total picture, how the characteristics interact, how they com- 
bine, how much various ones should be weighted, and so on. When 
such multivariate mental manipulation and mensuration must take 
place one can but wonder why diagnostic attempts have any reli- 
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(which we need not consider here) why we might fail to dimcn- 
sionalize reliably a complex dimension, but when wc do dimcnsion- 
alize one reliably we must infer that there is a systematic relationship 
among the relevant subdimensions. 

I bring up this rather difficult problem of unitary and complex 
dimensions because one of the major tasks of our science is to 
reduce complex dimensions to their relevant unitary components. 
By this I mean the determination independently of the unitary 
dimensions which combine in some fashion to produce the complex 
dimension. It is only in this way that we can derive our most scien- 
tifically useful cause-and-cffect relationships, namely, the relation- 
ships between dimensions described by a relatively unitary scale 
and the behavior which results from the manipulation of that dimen- 
sion. Only in such instances do our laws become what I have called 
precise. And we will, at the same time probably discover that char- 
acteristics which we believed to be significant contributors to the 
complex dimension were in fact not. 

I do not think that any of us fail to see how complex dimensions 
may well be constituted of subsidiary dimensions. Yet, as these com- 
plex dimensions become broken down into more and more sub- 
sidiary dimensions, a real question may arise as to how we can tell 
when we have arrived at a relatively unitary dimension of behavior. 
I know of no satisfactory answer to this question from a practical 
research point of view. Before any complex dimension can be broken 
down, the investigator must have ideas or hypotheses concerning the 
nature of the subsidiary dimensions so that some sort of independent 
scaling attempts can be undertaken. Ideally, when a complex dimen- 
sion IS reduced to a set of subdimensions which are relatively uni- 
tary, these subdimensions may become the manipulable stimulus 
conditions. Each may be manipulated independently to evaluate its 
influence, if any, on behavior which the investigator believes rele- 
vant. 

But, it is only recently that systematic attempts have been made 
to break down complex psychological dimensions into their com- 
ponent dimensions. In general, some form of factor analysis or deriv- 
ative therefrom is being used most successfully in this very impor- 
tant work. It is beyond the scope of the present discussion to 
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ability at all. It is just such situations as this which makes it all the 
more apparent that complex dimensions must be broken down into 
subdimensions which are more unitary in nature so that the influence 
or weight of each can be determined independently of the influence 
of others. I make no pretense that this can be done easily, but I also 
have made no assertions concerning the ease with which problems 
confronting a scientist are worked out. 


SUBJECT ANALYSIS 

I have thus far discussed some issues which I have felt to be espe- 
cially germane to response measurement and stimulus analysis. The 
third and central component in the research situation is the subject 
himself; indeed, the subject or organism is obviously the raison d’etre 
for psychological research. Some stimulus-oriented researchers 
would sometimes almost appear to resent the fact that a subject is 
necessary in order to derive relationships benveen environmental 
(or task) variables and responses. Bur, whether we like it or nor, the 
subject remains. 

As indicated earlier, the number of environmental and task vari- 
ables which might potentially influence behavior is almost unlimited. 
So also is the number of characteristics or variables of the subject. 
Any characteristic of the subject which might be shown to vary 
reliably in amount among subjects is potentially a relevant subject 
variable. But a variable for what? Let me back up a bit before this 
question is answered. Because the previous material in this chapter 
has direct bearing on the needed discussion at this point, a series of 
brief statements should bring us to a position where subject char- 
acteristics or variables can be placed in their proper perspective. 

1. The first step in any such research is to quantify characteristics 
on which subjects differ. This quantification may take place at all 
levels and by all methods discussed earlier. We may use physical 
scales to obtain, for example, height, weight, and chronological age 
differences. At the other extreme we could identify qualitative dif- 
ferences such as race or occupational differences. 

2. On the level of personality and intellectual differences alone 
it would seem that the number of characteristics on which subjects 
differ is hopelessly large. It is here where factor analysis makes an 
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causing differing thresholds for embarrassment? In this kind of re- 
search we start initially with subjects vot differing on the subject 
variable under investigation and then vary environmental conditions 
in some systematic fashion to sec if the subject characteristic is 
affected. In the parenthetical comments above where conditions 
were varied to change anxiety we were doing this very thing. The 
well-known identical twin studies arc essentially of this nature. At 
birth, identical twins are assumed to be equal on subject character- 
istics. If one of the twins is placed in one environment and the other 
in quite a diverse kind, differences with respect to, say, intelligence 
which develop may be attributed to differences in the environment. 
Research using natural variation and statistical control is sometimes 
employed to determine factors influencing subject variables. The 
illustration given earlier in the chapter in which the investigator 
attempted to determine the influence of scouting on adjustment is 
such a study. 

Problems of research design in manipulating subject variables are 
theoretically no different from those present when stimulus manipu- 
lation is carried out. However, in actual practice, the working out of 
a design in which cause-effect relationships can be stated with con- 
fidence is a serious matter. With the rapid increase in the number of 
students in clinical psychology there has been a commensurate 
growth in research in which subject variables are manipulated. A 
very common procedure is to choose groups falling into different 
clinical diagnostic categories, expose all to a standard stimulus situa- 
tion (such as a learning task) and observe what if any differences in 
behavior occur. If differences do occur they are attributed to differ- 
ences in diagnostic categories. We shall see in later chapters that such 
experiments have often violated fundamental rules of scientific 
method. 
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not a mere matter of accepting or rejecting operationism, for to ask 
such a question is to ask whether one accepts science as a tech- 
nique for understanding the laws of nature. Indeed, I would assert 
that a criterion of whether or not a so-called empirical concept is a 
scientific concept is whether or not it has been operationally defined. 
The starting point of any science is, basically, a set of phenomena 
which have been operationally defined. Tor, in their simplest and 
basic form operational definitions specify the measuring operations 
used to identify phenomena. Thus, k seems to me that operational 
definitions, stripped of their excesses, reflect measurements, and 
reliable measurements are the roots of any science. Reliable measure- 
ments initially identify the phenomena with which a science con- 
cerns itself. An operational definition need not explicate a functional 
relationship-, it may only reflect a demonstration that a reliable 
phenomenon exists. Further research may then be undertaken to 
try to learn more about the phenomenon by discovering of what 
variables it is a function and how k is related to other phenomena. 
Operational definitions are not a science for they need not express 
relationships and they are not theory, but they are the necessary 
base for a science. 

An operational definition docs not tell us much about what the 
important or relevant variables of the defined phenomenon are. To 
repeat, basically an operational definition simply tells us that there 
is a phenomenon. Bergmann and Spence point up this matter as 
follows; 

We see that even at the level of the empirical laws the scientist can- 
not derive any help from operationism. He will have to rely upon his 
own ingenuity and whatever help he might be able to get from an 
articulate theory ( 2 , p. 5). 

I have said above that operational definitions are not theory. 
Yet there are certain issues regarding the relationships between 
what I have called here operationally defined concepts and theoreti- 
cal concepts that must be explored. That is, operationally defined 
concepts are sometimes used as explanatory concepts; operationally 
defined concepts may, of course, enter into theoretical statements; 
operationally defined concepts are used by some writers as theo- 
retical concepts. And we are told by some that theoretical concepts 
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LATITUDE ALLOWED 

Partially as a means of setting before you illustrations of oper- 
ational and nonoperational definitions, and partially as a means of 
outlining the scope of operationism, I will take up three general 
problems which inevitably arise in thinking about the topic. In 
subsequent sections of the chapter I will give extended illustrations 
of operational definitions and in conjunction with these illustrations 
make further observations of the value and limitations of such 
definitions. 

Literary lead-ins. Definitions which are commonly found in stand- 
ard dictionaries I will call literary definitions. At this point I wish to 
contrast literary and operational definitions but at the same time 
point out the usefulness that some of these literary definitions have 
in psychology. Sometimes improved communication may result if 
the scientist precedes his operational definition of a new concept by 
a literary definition. These literary definitions may help the reader 
understand the general nature of the phenomenon which the scien- 
tist is trying to bring under careful scrutiny. An illustration will 
show what 1 mean. 

Levinson (7) was concerned with a subject trait which he called 
ethnocentrism: 

Ethnocentrism is conceived here as an ideology: a relatively organ- 
ized, relatively stable system of opinions, attitudes, and values. The 
term “opinions" refers to ideas about the nature of social reality; these 
include specific “factual” beliefs as well as more underlying imagery 
of groups and institutions. . . . They are the psychological facts and 
assumptions in the individual’s conception of society. The term “atti- 
tudes” as used here refers to one’s readiness for action; it includes all 
ideas about what should be done to. for, or against any social entity. 
Values are the individual's standards of right and wrong, good and 
bad. (7, p. 19). 

At least in a general way, the above statements give one an appre- 
ciation of the behavior with which the investigator hoped to work. 
It is seen that the total state of ethnocentrism is believed to be a com- 
posite of three subsidiary states. Now, in this particular illustration, 
an operational definition of ethnocentrism was provided by a scale 
which Levinson constructed. Just how he got from the literary con- 
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The UR (unsuccessful reader) was defined as a child between the 
ages of 8-0 and 16-11 who had achieved either a Verbal or Perform- 
ance Scale IQ of 90 or higher, who had fallen 25 per cent or more 
below the mean reading grade level on the Wide Range Achievement 
Test, for a child of his chronological age, and who had attended public 
or private school for the expected number of years for his given age 
p. 268). 

I would prefer to give first a general operational definition of the 
unsuccessful reader, perhaps as follows: “An unsuccessful reader is 
one whose reading performance falls a specified amount below ex- 
pected performance for his age, education, and intelligence level.” 
Subsequently, this general definition can be reduced to a specific 
criterion imposed in this particular study. But, however one prefers 
to state the definitions, let it be clear that at one point or another 
in the exposition the basic operations must be stated, and if it is 
necessary to regress in order to establish communication it must be 
done. I would repeat, however, that I have been unable to find any 
illustrations in the literature where undue hardships have been caused 
by infinite regress. 

Provisional definition. Many times a scientist sets out to demon- 
strate a nc'iv phenomenon. That is, because of certain observations 
or because of theory, the investigator believes there exists a phenom- 
enon which has never been investigated scientifically. If he gives 
(as he must) an operational definition of this expected phenomenon 
before the operations are actually carried out, I shall call it a pro- 
visional definition or pre-research definition. It must be provisional 
because the investigator may find either that he cannot carry out the 
jhr .dASs .ua .urw ^hRoAtnermn is .^sco vexed. 

But, what the investigator does is to say; “If I do this, and if such 
and such happens, then I define the phenomenon by these oper- 
ations.” 

A considerable amount of research has been built around concepts 
which Freud advanced on the bas^ of relatively nonsystematic 
observations. This research, in the last analysis, is an attempt to give 
Scientific status to the concepts; it is an attempt to give operational 
definitions so that further work can be done to demonstrate the con- 
difions which cause the phenomena to vary, their interrelationships, 
and so on. Or to put it bluntly, these studies have asked-or should 
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of the criticisms sometimes directed at operationism it that a strict 
interpretation of it would mean that each critical word in an oper- 
ational definition must itself be defined operationally, each critical 
word in these definitions so defined, and so on, so that there is an 
infinite regress of definitions. Let me first concede that indeed such 
regression must take place if necessary to establish communication. 
But also let me quickly add that in actual practice the results of this 
necessary concession are rarely onerous. As a science develops, 
standard concepts are developed. By “standard” I mean concepts 
that are understood by all. (Such terms are sometimes called primi- 
tive terms.) Such concepts may then be used in new operational 
definitions without acquiescing to a demand for regress. For example, 
if the term “Stanford-Binet I Q” is used in an operational definition, 
few would need a second definition explicating what is meant by it. 

Sometimes in defining a new concept rather omnibus operational 
definitions may result. Although these may seem oppressive from a 
belletristic point of view, if such length is necessary to expose the 
meaning of the concept, it must be accepted. Let me give you an 
illustration, which while long, is still incomplete: 

.... I shall define operationally the strength of the cathexis to any type 
of goal by the tangent of the angle made by the line resulting from 
plotting the magnitudes of the measures for getting-to such a goal 
against the magnitudes of the measures for getting-to the standard goal 

i’S, p- 364)' 

This definition may need further elaboration by operationally 
defining what is meant by goal, standard goal, and magnitude of the 
measure. And in fact such definitions are given by the investigator. 
If such omnibus definitions are necessary because of the lack of 
, standard concepts, the investigator has two alternatives (to indicate 
the extremes). He may make only a general operational definition 
and indicate that the specific operations implied by the critical words 
in the definition will become clear when the details of the procedure 
are set forth. Or, the investigator may include in a single, long 
definition all detail that is necessary to establish communication. 
Personally, I prefer the former method as a means of keeping our 
scientific prose from becoming too ponderous. Here is an illustration 
of what I would call a ponderous operational definition. 
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discussed earlier. That is, I rvUl not go into all the ^ 

tions for each illustration, hut tviU found 

that the detaUs of the operations could be uddud- Jh , • , 

it useful in some of the illustranons to lead up to the operation 
definition with an abbreviated Uterary definition^ 

identification. 

RESPONSE identification 

may be twofold.^ First and mo|^ q “he responses 

a group of individuals to a staac measured. The oper- 

of the individuals to this set of con i hat individuals do 

ational definition is complete when it i shown hat mdw 

differ reliably, i.e., where ‘V’ f„7a r^e“ or bTother accepta^^^ 
scores remain relatively constant definitions result from 

techniques for determining rehabiht,^, be used 

the initial stand^dization of any test, although they may 

in a variety of situations, where the unit of meas- 

Secondly, there “Xer Xtadividuals. In such situations the 
urement is a g™"P’ , hat groups differ reliably in their 
invesugator must show “"'F . ® ^he group score is the 

responses to a static set of co nsvchology experiments make 

unit in the distribution. Ce«a.n somal-p^chologi e P ^ 

use of such definitions. Let us look « a "umtoot 

these two kinds “XthTX is a characteristic of behavior 
I. Suppose you believe tha j-vp^-icrated. Let us call this hy- 

which no one else initial procedure is to construct 

pothesized characteristic X. ^^^^^ich you think “gets at this 

a test, say, a You find that the test is reliable, 

particular characteristic qf b • characteristic measured by 

Now your definition is simply- ^ j^age of 

this tL,” and you point out or exhibit the 
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have asked: “Is there a phenomenon to he operationally defined? 
Can repression, displacement, or transference be defined operation- 
allv, or are these pscudophenomena resulting from the lack of control 
for Freud’s observations or to false inferences from the observations? 
If the operations do establish that there is a certain phenomenon 
of the nature Freud reported, then an operational definition is given 
it and it can be brought into the realm of scientific investigation 
and discourse. If, on the other hand, no such phenomenon can 
be demonstrated by acceptable operations, the term remains outside 
the scope of science. This would not mean that Freud was wrong; it 
would only mean that the process he has indicated has no scientific 
meaning. If many and varied attempts arc made to establish the exist- 
ence of the phenomenon, and If they arc all unsuccessful, the prob- 
abilities become great that Freud’s observations, or his interpreta- 
tion of those observations, were somehow in error. 

FORMULATION OF OPERATIONAL DEFINITIONS 

For expository purposes I will say that operational definitions of 
phenomena are formulated by six different approaches. These six 
approaches are determined by the nature of the research situation. I 
make no claim that these encompass all research situations in psy- 
chology, nor that all can be clearly distinguished from the others. 
I do believe, however, that the discussion will cover most typical 
situations in which psychological research is carried on. To a large 
extent the six situations parallel the outline of the previous chapter 
where various components of the research situation were analyzed. 
I have named each of the six approaches, probably quite inade- 
quately, but such naming at least allows us to have a breakdown 
of the material into rough classes and perhaps facilitates communica- 
tion. 

Three other prefatory comments are in order. As indicated earlier, 
the different approaches will be extensively illustrated and in con- 
junction with some of these illustrations I will bring up general issues 
about operational definitions which have not hitherto been discussed. 
I do this in this fashion simply because certain of the illustrations 
make very clear the issues involved. Secondly, I will keep my illus- 
trations of operational definitions primarily on the general level, as 
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that it is not proper to ask such a question; it assumes that there is 
some other source to which one can appeal for truth. It u a perfectly 
valid question to ask: “Do you think that this scale measures anxiety 
in the full range of behavior to which the term is commonly ap- 
plied?” Such a question can be answered by research if other or more 
expanded ideas of anxiety manifestations can be given adequate 
quantification. The facts are that manifest anxiety as defined by th 
scale has been found to relate to several other forms of behavior i- •, 
it is a useful tool for further research and anyone who looks at t 
operational definition knows the meaning of the term anxiety 

"^4^' Next, let us imagine that we wanted to give an operational 

definition of group morale, and that our groups are 

According to certain considerations we might e e ° ^ . 

our literary conception of group mblr 

different amounts by such behaviors as number of AJOL a, number 
of visits to the dispensary, number of letters 
Indeed, we might use a number of ""I'^hat ,he 

composite of th%m for our index of morale. If we can hat the 

measures are reliable we have given an operational definition to 

■"rThe term cokesiveness of a group means, i" J 

"rot" n V-med Ration. » cet"^ 

differ reliably on this response index, we na> y 

operational definition to or the character- 

6. Intelligence may be defined h 

istic measured by a specified . ' j , definitions in two 

people think of the implications of operationa 

Lyl Some think of the response me^sure^as^a te^ 

infer a characteristic, state, or p b operational 

think of it on the strictly 

definition of ■""=“'2'"“ ’ er I svill discuss in detail differences in 
or “score. In a later chapter 
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course, you have only defined your concept (X); what else it is 
related to, hence, how useful it is in understanding behavior, must be 
demonstrated by subsequent research. I shall examine this problem 
in more detail later when discussing another concept. 

2. Next, let us suppose you construct a performance test which in 
your opinion is tapping skill in basic mechanics. If you show that this 
test measures reliably, you may then define Mechanical Aptitude as 
what is measured by performance on the test. 

With regard to this test, let me set up an extreme situation. Let us 
say that you constructed a papcr-and-pencil test which consisted 
largely of questions about the literature of the ancient Romans. 
Imagine further that the test measures reliably. Finally, suppose you 
say that mechanical aptitude is measured by performance on this 
test. This may sound ridiculous, and certainly, no one would ever do 
this. But, 1 wish to point out that your operational definition of 
mechanical aptitude, defined as performance on a test in which the 
questions are about Roman literature, is perfectly sound. That is, 
you can easily defend such a definition on the grounds of strict 
operationism. However, you cannot defend it on social scientific 
grounds; you certainly would be accused of a lack of propriety in 
assipng names. One of the purposes of operational definitions is 
to facilitate communication, and the above naming might actually 
hinder communication. ® ° 

3. One of my colleagues {12) developed a scale to give oper- 
ational definition to manifest anxiety. This scale, consisting of se- 
lected items from the Minnesota Multiphasic Personality Test which 

f shown to be 

reliable. So, the definition of anxiety is given by the measuring in- 
strument; a person who makes a high score is said to have high 

^nriety 

Now, you may have an impulse, as a number of others have, to 
ask: But does this scale really measure anxiety?” What this question 
seems to imply is that the one asking the question doubts that the 
scale measures all characteristics of behavior which have been labeled 
anxiety by clinicians. This may be quite true, but, such considera- 

non. .r. ^idering the adequacy of the definition 

i perfectly sound. So, I would say 
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range of behavior tested and which would be defined as intelligence. 
(Cf., Spiker & McCandless, 10, for an extended discussion of the 
status of the concept of intelligence^ 

7. Simple response-defined concepts are not limited to testing 
situations in the clinical Sense. In the field of learning, for example, 
there are many such definitions. Operant level may be defined as 
the number of unreinforced bar pressings in 30 minutes in a Skinner 
box; or, exploratory behavior as the number of traversals of 15-in. 
maze units per unit of time for rats satiated with water and food (9). 

I think these are enough illustrations of operational definitions 
which I have tagged simple response definitions. These are the most 
elementary operational definitions possible m psycho ogy. n 
situations to which these ate applied we need only demonstrate a 
reliable individual (or group) difference in behavior. 

Complex response identification. Essentially, definitions falhng in 
this category are elaborations of simple response identification. 
Several discrete response measures go together to define the 
cept. These response measures are not “put together in an “^‘“7 
fashion; rather they become the base for a concept 
correlational criteria are met. The best illustration “ 

definition is given by factor analysis. Let us review this procedure 

large group of carefully selected tests is given a sample of indi- 
vIL^eL individual te«, if reliable, can orm 'he b«. for an 
operational definition as discu^ed under simple 

tion Bv factor analysis, intercorrelauons among test scores are 
determLd and a group of tests which 

selves, but not with other groups, define the “ 

is not directly defined operationally m terms of 
among indiviLals, but rather in terms of its 

hTve link Tr no relationship with other factors derived from Ae 
br„e; of tests. So, the'^final definition is in terms of perform- 

to differentiate one phenomenon from anoth 
response identification. Idcntificaaon o ^ behavior 

other illustration. PresumabI)-, no sing e others Qinical 

will differentiate one clinical sj-ndrome from all others. Clinical 
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conceptual levels of thinliing. At the present time let me say only 
that both methods of thinking about such concepts are used. In the 
case of the concept of intelligence I want to again digress to discuss 
issues about the concept which periodically crop up in our litera- 
ture. ^ . 

For some insistent reason many psychologists, when considering 
the concept of intelligence, have developed the peculiar idea that 
not only should a definition define but it should lay bare the very 
nature, essence, or significance of intelligence. An operational defini- 
tion is a definition; it is not supposed to expose the whole truth of 
nature or the order of the universe. In the case of intelligence all 
the definition does is to state that individuals’ performances differ 
reliably on a test that is called an intelligence test. It has been said 
that an operational definition of intelligence such as I have given is 
a sterile as opposed to a “dynamic” concept. I am not concerned 
about the fertility of definitions nor the dynamism of words; I am 
concerned only with specifying the basic meaning of a concept. 
Response-identified operational definitions are not supposed to repre- 
sent relationships between dependent and independent variables. 
Basically, an operational definition asserts only that a phenomenon 
has been reliably measured. When we operationally define other 
concepts, say learning, we are not expected to indicate in the defini- 
tion all the variables of which the phenomenon is a function, how 
it will help raise the level of civilization, or how to make the United 
Nations an effective instrument for world peace. So too, when we 
define intelligence we are not obligated to show that it correlates 
with school grades, that business executives have more of it than do 
ditch diggers, or that geniuses have more than idiots. The operational 
definition exposes the operations by which you are quantifying the 
phenomenon; it is not both the starting and stopping point of 
science. There are those who complain that an operational definition 
of intelligence does not really say what intelligence is. To say what 
intelligence is in a scientific sense is to say what scores on the test 
relate to, and relationships are obtained by investigation and re- 
search, not by definition. Of course, it is always a perfectly legiti- 
mate question to ask whether or not a particular test samples well 
all the behavior which in a literary sense would be called intelligent 
behavior. Such a question might even lead to a broadening of the 
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general type of operations involved, a single illustration should be 

sufficient. . • i- i. 

Let us imagine that an investigator sets out to dimensionalize the 
affective tone of verbal materials. He believes that certain words 
evoke feelings of unpleasantness, others feelings of pleasantness, 
others neutral feelings. To a group of judges he presents say, too 
words, which he thinks are representative of the dimension as he 
conceives it. The judges are asked to rate these words on a 7-point 
scale from most pleasant to most unpleasant. If he shows that the 
words (although not necessarily all of them) can be allocated along 
the scale in a reliable fashion by the judges, he has, by noang his 
procedures, given an operational definition to affective tone, e may 
further wish to specify, for example, that all words with a rating of 
2.0 or less will be called unpleasant, all of those with a rating of 5.0 
or more pleasant and all in between neutral. If he speci es t e m 
structions to his judps, and other relevant details of the scaling 
technique, the operations are repeatable. 

I would like to point out in concluding this section on tesp n 
identification that none of the operanons ptodnees ' 

ship in which stimulus variables are involved. This is “ 

lated to the discussion in the previous chapter ?" 
sis. Response correlation rarely gives us a basis for 
and effect. We shaU see that in the next general W 
definition, stimulus-response identification, we usua y . , 

a crude stimulus-response relationship indicated in 
definition since some form of manipulation of a stimulus vanable is 
necessary to define the concept. -nnrfnrs 

And, let me repeat that the /defiS 

defined by response identification m not ^ * h 

What varkble^mf^uence the re earcr^ 

to other concepts, and so on, are matters for subsequent research. 

STIMULUS-RESPONSE IDENTIFICATION 

Under this general heading I have three ffiwsions^ 

what different sets of operanons. categories, but 

fied with the names nor with .“V 

they are the best I have been able to formulat . 
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groups are based on a composite of several responses. Insofar as 
such grouping can be reliably formed, whether by tests, clinical 
judgment, physiological measurements, or combinations of all, an 
operational definition is formulated provided the criteria for inclu- 
sion and exclusion in classes are specified. The response measure, 
regardless of level of quantification, must be a part of the public 
record in the definition. We cannot define a concept operationally 
on the basis of unquantified intuitions. The operations must be 
repeatable by others. 

The use of factor analysis to define concepts has certain merits 
relevant to the over-all picture of operationism. I have said that one 
of the virtues of operationism is that it restricts the number of con- 
cepts. A concept, to be operationally defined, must result from 
procedures which are basically different from those used to define 
another concept. This problem concerning plurality of concepts 
will be returned to later in the chapter, but 1 need to open the discus- 
sion at this point. An operational definition of a phenomenon can be 
made by the use of a single, simple, short, paper-and-pencil test 
according to the criteria discussed under simple response identifica- 
tion. Now, conceivably, the number of such phenomena which 
could be so defined is almost unlimited. It therefore becomes appar- 
ent that the number of operationally defined concepts might be 
multiplied excessively. And it is a fact that we have no over-all plan 
in our scheme of science of psychology to avoid such multiplicity. 
But it is here where factor analysis makes a strong contribution. 
Independent operational definitions should be maintained only for 
unique phenomena. By factor analysis, tests which measure essen- 
tially the same characteristic of behavior lose their individual 
identity for definitional purposes. Each test becomes just one of a 
group of tests measuring the same characteristic, and a single defini- 
tion is used to reflect the operations defining the factor measured 
by all the tests. 

Sculhjg identification. As mentioned in the previous chapter, the 
human discriminatory response may be used to dimensionalize char- 
acteristics of objects or events (such as responses). The procedures 
involved constitute the operational definition of the characteristic. 
Since we have discussed at some length in the previous chapter the 



Operational Definitions 


67 

varying amounts of the perimeters missing and asked subjects to 
report when they did and did not see a triangle, e t en ® " 
closure as the amount of perimeter which must e pr^nt e o 
50 per cent of the subjects repotted seeing a triangle. The steps 
this procedure operationally define closure. 
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ations have been nndefinitive m demonstrating P 



65 Psychological Research 

S-R identificatw?i 'with physical scale. By this technique the oper- 
ational definition of a concept is obtained by showing a relationship 
between a physical scale of some sort (a physical stimulus scale) and 
the responses prompted by that scale. 

How do we operationally define pitch? Basically, the operations 
require only that we vary cycles per second of a sine wave and that 
reliable judgments concerning variations in highness or lowness of 
the sound be reported. It so happens that in this case the amount 
of phenomenal highness is directly related to cycles per second. Of 
course, as discussed earlier, the complete definition would require 
specification of the particular technique used to present the stimuli, 
e.g., constant stimuli, and perhaps further elaboration of this tech- 
nique if it deviates from the method as commonly understood. 

Now, as we can see, such operations not only define pitch but may 
establish the empirical relationship betwen cycles per second and the 
phenomenal change in sound wc call pitch. The definition not only 
identifies the critical variable needed to produce the phenomenon, 
but also gives us a lawful relationship. In the simplest case of such a 
definition we would present only two variations in cycles per second 
and if the judgments in sound changes were reliably different we 
have our definition. And, in the crudest sense we have a relationship 
even with only two points along the physical scale having been 
presented. 

In defining a lower absolute threshold we first explore a physical 
stimulus scale in an area where we expect the subject to be able to 
perceive the stimulus part of the time and not perceive it at other 
times. Thus, we relate the scale to responding or not responding. 
Then, by arithmetical operations we determine a single value above 
which we expect the subject to respond more than 50 per cent of 
the time and below which less than 50 per cent of the time. This 
point is the threshold. Again, in actually presenting the stimuli we 
would use a particular psychophysical method and this would be a 
part of the elaborated definition. 

The term closure has been used rather widely by some psychol- 
ogists. Our literary conception of the meaning of the term is that it 
represents a tendency to see incomplete forms as being complete. 
Now, can this concept be given operational definition? A study by 
Bobbitt (4) shows that indeed it can. He presented triangles with 
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on behavior of a zero amount of a dimension is compared with the 
effect of some finite amount. If there is a reliable difference in 
behavior resulting from these two conditions, 
to derive it define the phenomenon. The symbols, E/C, refer 
experimental and control conditions; the experimental condition is 
the one having a finite amount of a given stimulus “ndition the 
control condirion, zero amount. The symbols have been used by 
Marx (S), but his interpretation of the concepts derive is 
what different from my interpretation. I will return to ma 
at a later point in this book. It is sufficient to say at ^ 

operational definition of phenomena by these operations 
uLd as long as control groups have been used, which is a good 
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Once more 1 must introduce a word of caution, and ask you to 
think back to the previous chapter where we dealt with unitary and 
complex dimensions. If our scaling techniques allow us to infer a 
reliable dimension this dimension is given operational definition by 
the procedures. Furthermore, if this dimension is now manipulated 
as a stimulus dimension and behavior is reliably affected, the rela- 
tionship is given operational definition, whether named or not. But, 
if the dimension scaled is complex, as it may well be, the investigator 
must guard against concluding that he has demonstrated a funda- 
mental law based on a relatively unitary stimulus dimension. If the 
stimulus dimension is complex we must be aware that we might be 
able to break it down into more unitary dimensions. But, at the 
particular level at which an investigator works his operational 
definitions are sound; he simply should not be blind to the fact that 
further scaling operations may allow him to arrive at definitions 
of more unlta^ dimensions and thus indicate further research as to 
the relationship between these dimensions and behavior. If stimuli 
are reproducible, regardless of their complexity, and if reliable 
judgrnents of differences are made among these stimuli, then all 
criteria of operational definitions have been met. But let us not be 
myopi^c before the altar of operationism; let us recognize that it is an 
identifying device and that what one has identified is a matter for 
further research. 

In this section on S-R identification I have discussed operational 
definitions of phenomena which are relationships between behavior 
and either a physical or psychological stimulus dimension. In these 
cases the input” on the stimulus side was positive. That is, a certain 
measureable amount or quantity of the dimension was used. The 
phenomenon was defined if a minimum of at least two points along 
the dimension was sampled. At the same time, and even with only 
two points, a very crude relationship is established. Normally, how- 
ever, m practice three or more points from along the dimension are 
used so that the relationship defined is more precisely given. The 
fact that at least two positive or finite amounts of the dimension are 
used contrasts this procedure with the final one which I will now 
discuss. 

S-R, E/C identification. The operations under this heading differ 
from the other two S-R types in that in the simplest case the effect 
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type of definition is what I call a diagrammatic type of operational 
definition. In such definitions we have the outline of the experi- 
mental design needed to demonstrate the phenomenon. We can, o 
course, construct these definitions in the more common verbal form. 
These definitions become the “if-if-then” type of verbal definitions, 
and sometimes may get a little dangling. For retroactive inhibition 
we would say: “If under one condition only Task A is given, and u 
under another Task A is followed by Task B, and if the retention of 
A is better under the first condition than under the second, then 
we have defined retroactive inhibition.” Because 1 think the diagram- 
matic definition is somewhat less burdensome and perhaps somewhat 
more easily grasped, I prefer it and will use it almost exclusively m 
subsequent discussion. 

The second comment concerns specification of the direction of 
the difference bet%veen the experimental and control conditions. In 
some definitions (to follow) it is not necessary to specify the direc- 
tions of the performance difference. Clearly, however, in the case 
of retroactive inhibition, the control condition must result in better 
retention than the experimental condition. Indeed, if the experi- 
mental condition resulted in better retention than the control we 
would have defined retroactive facilitation. 

2 . I would like next to discuss the definition of frustration, and 
certain additional issues which it raises. In diagrammatic form, frus- 
tration may be defined as follows: 

Control; Goal oriented: No blocking 

Experimental: Goal oriented; Blocking 

If, as a consequence of these procedural differences, there is a reli- 
able difference in behavior, frustration is demonstrated, and the 
procedures define it. In this particular instance, further elaboration 
of what is meant by goal oriented and blocking will probably be 
necessary. That is, the investigator would need to specify what par- 
ticular goal or task has been set for the subjects and what particular 
technique was used for blocking attempts to attain the goal. And. 
of course, the particular response or responses measured will be a 
part of the definition. 

In studies of frustration, a number of different response measures 
have been used. Furthermore, some of these have been given names, 



Operational Definitions 


73 

years as a reliable phenomenon simply because it had not been 
operationally distinguished from “h" ? t"a y^ differentiated 
nomena. The simple reason J the operations, 

was failure to use a control condmon m car^ g 
An adequate definition of reminiscence requires th g P 

cedures: 

CouTKOr: Learning: Retenuon T« J^enrion Test a 

Experimental: Learning: Retennon Test I 

The series of dashed lines in “nd retention test 

that the time interval b«ween control condition. Now, 

is longer for this condmon ‘ ™ experimental condition 

if the retention on Test 2 is gr . j demonstrated and the 

than for the control condition, reminiscence 

procedures define it. . . control croup was used; 

In the earlier work on remin 

rather, reminiscence was said t experimental 

better on the second test * 3 " jcention test might 

condition alone. However, Retention Test 2 might 

well provide an additional lea S ’ because of this 

have shown higher of any facilitating process which 

additional learning, and not _ interval between the two 

was presumed to take P’^' “'“J® additional learning provided 

tests. Indeed, if we knew th forgetting 

by the first retention test, ‘nerc 1/ . (/) has shown that 

in the experimental condition, e operations had not clearly 

this is what happens. In other x ’ • ^ j shall discuss later in 

differentiated reminiscence from fc^etnng 

somewhat more detail this ‘oblems of definition which 

5. I now wish to turn to “””^0 phenomena, conditionmg 

may be illustrated by the discussi P^ general definition of 

and Ptinidocondinon”g- and US for uncon- 

conditioning, using CS for conuu 

ditional stirnulns. . 

• - Test ivtth CS7 


PresemCS-US series? 

No 

Yes 


Yes 

Yes 


Control: 

Experimental: 



FsycJiological ReseaTch 

produces better performance than the control, positive transfer is 
defined. 

I have said that one of the virtues of operationally defining con- 
cepts by the diagrammatic technique is that the basic experiment^ 
design is given by the definition. However, we should not be lulled 
into thinking that such definitions automatically protect us 
errors in carrying out the operations. For example, in the definition 
of transfer it is quite apparent that if we use different groups for the 
two conditions they must not differ significantly in learning ability. 
Or, if we use the same subjects in both conditions, any potential 
differences in difficulty of the material must be balanced out. Let it 
be evident that an operational definition only makes it clear in 
general how you are finding what you are defining; it assumes 
experimental and statistical competence to carry out the operations 
so that no confounding of variables is present. 

4. While it may seem fairly evident In the illustrations given thus 
far that the control condition is an essential part of the defining oper- 
ations, this has not always been the case. I want to discuss two cases 
as illustrations of where failure to consider a control group as a part 
of the definition led to, or may have led to, the defining of phenom- 
ena which did not exist. 

In some of the earlier work on the Rorschach, someone colored 
portions of some of the cards with a garish red. In presenting these 
colored cards it was believed that the responses to them differed 
appreciably, indeed dramatically, from the responses to black and 
white cards. This alleged difference was called “color shock.” Now. 
it seems clear that to differentiate responses to colored cards from 
those to black and white cards, hence, define color shock, an appro- 
priate control must be used. This could be done as follows: 

Control Cards; Without color 

Experimental Cards: Same as control with color 

Now, if any difference occurs in the quantified responses to the 
cards, an operational definition of color shock is given. Actually, the 
investigator might specify that the difference must be in a certain 
type or kind of response. But until such operations had been carried 
out there could be no acceptable definition of color shock. 

In the field of retention, reminiscence masqueraded for several 
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If the test performance under the experimental conditions is better 
than under the control, i.e., if more responses are made to CS, the 
procedures define the phenomenon. A very comparable set of oper- 
ations would define leambig in a general way except that we would 
omit the specificity of the stimuli of the initial series and indicate 
only that practice trials were given on a task. In very few modern 
researches on typical learning problems is the control group actually 
used. The reason appears to be that learning (or conditioning), being 
such a pervasive phenomenon, would be known to occur in the 
experimental group and not in the control. In most learning situa- 
tions we would expect the control group to have a zero measure 
of performance on the test trials. For example, if the material were 
nonsense syllables it is highly unlikely that the control subjects 
would get any of the syllables correct when they had never before 
been exposed to them. In short, with some well-established phe- 
nomena, the control group is not needed to demonstrate the exist- 
ence of the phenomenon because it has been demonstrated so many 
times in the past. 

But now, turning back to conditioning as such, there is reason to 
believe that a control group becomes an essential part of the defin- 
ing operations even though the phenomenon of conditioning has 
been demonstrated many, many times in the past. 

I have insisted that science is a series of analytical steps. One of 
the purposes of analysis is to isolate unique phenomena. The prob- 
lem is somewhat comparable to that of reducing stimulus dimen- 
sions to as unitary a level as possible. On the response side, likewise, 
we must isolate unitary or unique phenomena. Suppose that the 
phenomenon defined as conditioning is not unique in the sense that 
it is constituted of two or more isolable phenomena which can be 
given independent definition. Now, on a strict operational level, the 
definition of conditioning as I have given it would not be misunder- 
stood; the operations are clear. But, as scientists engaged in an 
analytical enterprise, our definitions must keep pace with our analyt- 
ical research. If, therefore, we can break a phenomenon down into 
more elementary ones by appropriate differentiating operations, we 
must do so. Or, to say this another way; we break the variance 
down into as many components as there are discriminable operations 
that affect it. 
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clearly differentiating operations are used. I want to discuss this 
problem at some length, and will use initially as an illustration the 
problem connected with a definition of maturation. One definition 
which has been widely used is that resulting from idcntical-t\vin 
studies. One t^vin is used in a control “condition,” the other in an 
experimental “condition,” as follows: 

Practice on specific skill Test on this skill? 

(c.g., climbing)? 

Control: No Yes 

Experimental: Yes Yes 

If performance of the experimental twin has improved from the 
beginning of practice to the tests, and if there is no appreciable dif- 
ference in the performance of the control and of the experimental 
nvin on the test task, it has been said that maturation is demon- 
strated, The idea is that neuromuscular development sufficient to 
account for the behavioral change takes place without the specific 
practice. Unlike all of our previous definitions, the phenomenon 
rests on demonstrating no difference in performance on the test 
trials. A& will be seen, my objection to the definition has nothing 
to do with the statistical impossibility of confirming the null hypoth- 
esis. My objection is that these operations do not allow definition of 
an independent phenomenon clearly distinguished from another 
well-established one. In fact, I would insist that insofar as the above 
type of operations have been used to define maturation, no such 
phenomenon exists in a scientific sense, 

A well-established phenomenon in motor learning is that of trans- 
fer of skill. With this phenomenon in mind let us look at the control 
condition used to define maturation. The control group subjects are 
not kept immobile; as a matter of fact, they are allowed a great deal 
of activity. They may run, jump, crawl, turn somersaults, and so on, 
but they are not allowed to practice climbing. It is quite reasonable, 
therefore, that there could be heavy transfer from these other activ- 
ities to climbing. If the control subjects, therefore, do as well as 
the experimental subjects on the test trials this might be attributed 
to transfer, and no new phenomenon need be defined. It becomes 
apparent that to give an acceptable definition of maturation by these 
types of operations the control group must be kept completely im- 
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We may then state a general principle of precedent or priority. 
If the same response measure is used in the defining operations of 
two phenomena, and if the stimulus manipulations cannot be clearly 
differentiated, the phenomenon which can be demonstrated (hence 
defined) in a situation where by its literary conception the other 
would not occur, the first phenomenon takes precedent. I do not 
think this rule of precedent will solve all definitional problems 
where stimulus manipulations conflict, but I think it will handle 
most of them. 

All of this discussion is concerned with differentiating phenomena 
when the same response measure is or was to be used; hence, differen- 
tiation must be based on the stimulus manipulations. This does not 
close the problem for these E/C type operations. For, there are 
instances in which the same stimulus manipulations are used for 
two or more phenomena and in which case the differentiation must 
come in the response measurements. The rule here is fairly straight- 
forward. If the same stimulus manipulations are used, phenomena 
are differentiated by different response measures if, and only ifi 
those response measures are poorly correlated. The only ambiguity 
in this principle is that we have no set value of the correlation co- 
efficient which can be used as a clean-cut criterion as to when and 
when not responses are said to be poorly correlated. 

We have previously given a definition of frustration, and saw 
how this general definition can subsume subphenomena, such as 
aggression or withdrawal. In these instances the stimulus manipula- 
tions used to produce aggression and withdrawal are the same; they 
are differentiated on the basis of response measures. Obviously, if 
these response measures correlate highly there is no basis for dis- 
tinguishing two phenomena. But, since they do not correlate highly, 
scientific analysis is aided by insisting upon definition of independent 
phenomena. 


A SUMMING UP 

In this final section I would like to bring together certain of the 
scattered comments in the chapter and add a few statements of 
appraisal. Early in the chapter I indicated the three major benefits 
which I believed accrued to psychologists accepting wholeheartedly 
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defining criterion as such which tells us whether or not a new con- 
cept should be admitted. The only criterion which can be used is 
a research criterion. This criterion, stated simply, is that no new 
concepts should be admitted if the phenomenon is already measured 
by what might appear to be a somewhat different set of operations. 
You will remember that test-construction procedures illustrate well 
the simple response-defined concepts. Almost an infinite number of 
tests could conceivably be constructed and if one stopped at that 
point, each ‘would define a new concept. We have only one protec- 
tion against such an eventuality, namely, insisting that correlations 
be determined among tests and not admitting a new concept if the 
characteristic involved has already been defined. That is, if the cor- 
relation is high between one test and another, the same concept 
should be used for both. The fact that we have a number of reli- 
able tests all called intelligence tests shows that this is working out 
to a certain extent. 

While I would insist that this correlation criterion provides the 
only adequate protection against a wild proliferation of concepts in 
this area, I must quickly add that k is not simple to work out in 
practice. Suppose we have a test which defines the concept of 
clerical aptitude. Then suppose another investigator constructs a test 
to define finger dexterity. It might never occur to this second in- 
vestigator that his test correlates very highly with the one defined 
as measuring clerical aptitude, although this might actually be the 
case. In short, there are hundreds of tests which have been con- 
structed to define certain concepts and it would be next to impossible 
for an investigator to correlate his new test with all of these. We 
may, therefore, expect that tests which actually measure the same 
thing will have different names applied to them. 

It is in the handling of this whole problem that I have mentioned 
that factor analysis makes a very strong contribution; factor analysis 
limits the number of characteristics of behavior which must be given 
independent definition. So the problem is not hopeless; it is only 
a huge research task. The only other hope is for occupational fatigue 
of test constructors. I must repeat again that we must have inde- 
pendent responsci-defined concepts if the response measures for two 
or more tasks or tests or situations do not correlate. Restricting the 
number of concepts defined by response measures is not an arbitrary 
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sented, how high the degree of learning is, and so on, if the specified 
performance difference is found. All other factors mentioned may 
indeed be variables affecting the amount of retroactive inhibition 
but they are static variables as far as the defining operations are 
concerned. 

Finally, I am compelled to make additional comments partly^ y 
way of preparation for later chapters, partly by way of keeping 
some pesky terminology problems from getting out of hand, an 
partly to correct any erroneous impressions that empirical concepts 
are all nicely organized. What 1 have to say is concerned largely 
with S-R types of phenomena. 

I. I have said, and will continue to say in later chapters, that at 
the empirical level our research problem is to determine behaviora 
phenomena and variables which influence them. I have also said that 
an operational definition specifies the necessary operations neede^^ 
to demonstrate the phenomenon. Now this word “phenomenon 
occurs hundreds of times in this book; I don’t like the word but I 
haven't a good substitute. I think it is clear that when I use the word 
I mean a reliable behavior event or change; it is an event whose 
recurrence can be discriminated as such. When I say “phenomena 
and variables affecting these phenomena” 1 am stating my own bias 
for organizing research findings but I think I am also reflecting the 
organizational structure of much research in psychology. Research 
workers do center their work around one or two phenomena as they 
go about determining the influence of variables on these phenomena. 
But, to speak in this way is no more than that; that is, it is just a 
manner of speaking. For any reliable behavioral event or change ^ 
a phenomenon so that these ‘Variables which influence phenomena 
do themselves define phenomena. So perhaps I should say 
central phenomena and associated phenomena” instead of P ® 
nomena and variables which influence them.” In any event I thm 
It is fair to say that many psychologists do think in terms of core 
phenomena and that other phenomena revolve around them; thes^^ 
mhers I will continue to refer to as "variables which influence them- 
This centrality is shown by titles to research papers, to section 
headings in books, and so on. Thus we read: “extinction as a functio 
of amount of work;” “pitch as a function of intensity;” “intelHg®^*^,, 
as a function of race;” "learning as a function of meaningfulncss. 
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For there is only one basic principle, namely, design the experiment 
so that the effects of the independent variables can be evaluated 
unambiguously. The difficulty lies in implementing the principle in 
any specific research situation. I do not think there is any easy 
solution to this learning problem. I do not think, furthermore, that 
we can simply tell research workers to “watch out” for infraction of 
the rule and expect positive results. In order to learn we must prac- 
tice and practice and practice. We must learn what not to do as well 
as what to do. In the present work I can no more than get one 
started on this learning process; it is up to each of us to continue the 
practice. 

Essentially what I will do is give you illustrations of research 
errors. Then, I will note how each error may be corrected. Some of 
the illustrations will be from published reports; I use these reports 
because they are real and can be evaluated by all who choose to do 
so. As I said earlier, I think it is somewhat unfortunate that these 
should be found in print; it is at least unfortunate for contemporary 
psychology if it is going to be evaluated for scientific rigor. If there 
is a positive side it is chat such published reports may In the long run 
make better scientists out of all of us. If errors are to serve as a 
learning device, they cannot be hid. I doubt whether there is a single 
psychologist, actively engaged in research work, who has not at 
some time at least planned an experiment which did not meet critical 
standards. Most have probably executed such an experiment and 
some of us have mailed them to an editor. A critical colleague 
or a friendly editor (there are some) may have saved the research 
from being published. So, when I evaluate critically a piece of pub- 
lished research I do so without malevolence and with full awareness 
that many of us are in the fortunate position of having uninhibited 
colleagues and students who take fiendish delight in pointing out 
our blunders, usually before they become public. 

I have also had available another source of research reports which 
has provided me with a number of illustrations of errors on which 
to focus when designing research. As a consulting editor for the 
Journal of Experimental Psychology for six years I have had the 
privilege of reviewing scores of manuscripts. I cannot, of course, 
give references to these works. But 1 shall reconstruct them from my 
notes as faithfully as I can. Some of these have been rejected for 
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important indeed, which only the active worker in the particular 
area can give with the degree of expertness required. I am not aim- 
ing at this level of -research design or procedure. 

I suppose if I get clearly impatient on the matter of research 
design it is in the case where before an experiment is done it is 
recognized that the answer to the question cannot be obtamed or 
a test of the hypothesis accomplished but the research is undertaken 
anyhow after uttering the soporific phrase “it is the best that can 
be done under the circumstances.” I would insist that the research 
should not be done under this aegis. Why do it if it cannot possibly 
solve a problem or answer a question? It is not unusual to carry out 
an experimental procedure and then discover that it docs not accom- 
plish what we thought it would, but to do the experiment when we 
know it won’t accomplish what we want to accomplish is a clear 
expression of research stupidity or an unwarranted faith in the 
virtues of science. 

It will be apparent that my presentation deals almost exclusively 
with the classical nomothetic approach in which groups of organ- 
isms are used. I have been unable to sense the revolution taking place 
in psychological research which others have seen. This alleged 
revolution deals with the study of the individual and is sometimes 
called idiographic psychology. The idea is that we should discover 
the laws holding for the individual, not the laws holding for the 
group. I cannot see how anyone can object to the study of a single 
individual; he may be studied intensively in the sense that many 
different relationships are determined for him or intensively with 
respect to only one particular phenomenon. The latter plan of study 
has long been used in research on sensory processes. Ebbinghaus used 
it for his studies of memo^ and Skinner uses it for the study of a rat. 
I do not see that there is a systematic issue involved here unless 
those who champion idiographic analysis are trying to say that there 
is no commonality of laws from one organism to the next, in which 
case we will have as many sets of laws as we have people. I would 
insist that the laws and relationships about which we already know 
would deny this stand. So, where is the issue other than the ever- 
present one in all kinds of research concerning the generalizabilicy 
of results. Nevertheless, since I may be missing something I would 
refer you to two “pro” idiograph papers {2, 13), one “anti” (14)^ 
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discrepancies arc relatively unimportant in the sense that verbal 
conclusions are slightly askew but can be easily corrected and we 
retain the research as a sound contribution to the literature. At the 
other extreme are discrepancies which arc lethal and there is no way 
that a scientifically meaningful conclusion can be reached from the 
procedures used. These cases arc best exemplified by blatant con- 
founding of stimulus variables from different classes (environmental, 
task, subject) so that behavior changes measured cannot be said 
to be the result even of variables within a given class. In between 
these two extremes are discrepancies which, to be charitable, reflect 
a nonanalytical conclusion in an area where development of knowl- 
edge is far enough advanced to disallow such conclusions. Such 
conclusions are disallowed because there is a confounding of the 
manipulated variable by another variable in the same class. What 
happens is that as the investigator manipulates one variable in a 
class, another identifiable factor in the same class also changes so 
that the phenomenon produced cannot be said to be due to one 
particular component; it can only be said to be due to the one or 
the other or the combined influence. A design error occurs if 
identifiable components of the manipulations are not isolated when 
it is quite apparent that they must be if analytical conclusions are 
to be drawn. Note that the confounding is between variables within 
the class in contrast to the confounding of variables from different 
classes as discussed above. With some justification, confounding of 
variables within a class is considered a less serious error than con- 
founding of variables among classes. However, we shall have plenty 
of opportunity to compare the two and you may arrive at a differ- 
ent assessment of relative seriousness. 

^ It is my belief that the major errors in psychological research He 
in these two kinds of confoundings and my effort is directed toward 
extensive discussion of how such confoundings have occurred and 
how we go about trying to avoid them. The organization scheme 
for this presentation can now be outlined. It will be remembered 
that I identified (Chapter z) four types of variables which may be 
manipulated, namely, emiirormiental, task, instructional, and subject 
variables. For this discussion I will omit the instructional variable as 
an independent class and include it under the environmental vari- 
ables. I have said above that when we manipulate a variable within a 
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One of the most persistent and deadly errors which occurs in re- 
search is that when manipulating an environmental or task variable, 
subject variables also change. This is the problem of getting and 
maintaining equivalence of groups of subjects and will be the first 
topic; this refers to cells 3 and 6 in the table. Cell 9 is a special case; 
it refers to research in which a subject variable is manipulated but 
in which the results are confounded by the simultaneous variation 
of other subject variables. A consideration of cells 3, 6, and 9 will 
complete this chapter. In the next chapter I will consider in order 
cells I, 2, 4, 5, 7, and 8. 

While I consider the various stimulus confoundings to be the 
major source of error in psychological research there are a number 
of other problems involved in the research process which, when not 
adequately handled, may also be said to constitute errors. These will 
be considered following the extended material on stimulus con- 
founding. 

THE PROBLEM OF EQUIVALENT GROUPS WHEN 
MANIPULATING ENVIRONMENTAL AND 
TASK VARIABLES 

In the usual situation where environmental or task variables are 
manipulated, two or more groups are treated differently with the 
aim of arriving at a conclusion concerning the effects of the differ- 
ent treatment. To reach this conclusion, the skill or abilities of the 
groups per se, and the experiences these groups have, should not 

1 er except for that inserted by the investigator as the experimental 
treatment. A major source of error in carrying out research is failure 
to appreciate the importance of this simple principle. Obviously, 
causes of differences in ability levels and interaction of differences 
m ability level with differential experimental treatment is a legiti- 
mate area of study, but I am concerned now with the case where 
the logic of the research rests on equivalent groups and the investi- 
gator IS interested only in the effects of manipulating an environ- 
mental or task variable. As with most of the errors we are studying, 
we have obvious infractions and others that are quite subtle. 
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I have seen experimental reports which, simplified, had methods 
of forming groups about as follows. One entire class of elementary 
psychology students, meeting each day at 8:30 a.m. for class, is given 
one experimental treatment. Another entire class, taking the same 
course but meeting at 1:30 p.m., is given a different treatment. 
Differences in behavior are then said to be a function of differences 
in treatment. I am sure that you can think of a number of reasons 
why we W9uld not expect these two groups to be the same as if we 
had thrown all the students in both the classes together and assigned 
them to the two treatments on a random basis. Depending on the 
nature of the performance being measured we may even point out 
specific reasons why we would expect one group to be superior to 
the other. I do not think we can accept such research unless the 
investigator shows that the two groups did not differ on a perform- 
ance that is relevant to (correlated with) the performance measured 
during or following differential treatment. Thus, in the above illus- 
tration, the investigator might use a pretest to show that the groups 
did not differ on, say, attitude toward authority before he intro- 
duced differential treatment designed to change attitudes toward 
authority. This is what I mean by the investigator supplying sup- 
porting evidence of equivalence of groups when his method of 
deriving his groups is suspect as far as the random-groups logic is 
concerned. 

I have also seen studies in which students in one school, say Com- 
merce, are used for one condition and students in Liberal Arts for 
another without adequate supporting evidence that the groups did 
not differ appreciably on relevant variables. I have also seen re- 
search reports in which subjects were obtained on a semivolunteer 
basis and in which the first 50 subjects to volunteer were placed in 
one condition, the second 50 in the next, and so on. We cannot com- 
promise this issue; the method of assigning subjects must be assuredly 
random or the investigator must present supporting evidence that 
whether random or not the effect was the same-the groups did not 
differ significantly on variables relevant to the skills required for 
the experimental task. The issue is such a simple one that I some- 
times think we overlook it in our concern with the complexities of 
the details of the treatments given the various groups and in our con- 
cern udth statistical analyses of the data obtained. 
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in which we used a practice task and an experimental task with the 
idea of matching groups on the practice task. Even though I helieve 
that anyone would agree that these tasks had high apparent com- 
monality the correlation between the two was found to be much too 
low to justify use of the practice task as a matching task. Ihus, 
even though the groups did not differ appreciably on the practice 
task the small differences which we observed on the experimental 
task may be, at least in part, a function of unequal ability and not 

the experimental variable. , 
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operations and should have been presented as a matter of course to 
support the implication that the subjects were assigned at random. 

If 1 may summarize, I think there are two principles we should 
follow when reporting research in which random assignment of 
subjects has been used: 

1 . State exactly the procedure used to assign subjects to the 
groups, e.g., random numbers, alternation, etc. 

2 . If possible, that is, if allowed by the experimental design, give 
data which support the expectation of equivalence of groups when 
random assignment is made. 

MATCHED GROUPS 

If an investigator does not have a way by which subjects can be 
assigned at random, or, even if he docs, he may prefer to use a 
matching procedure whereby the groups are equated on a relevant 
skill before introducing the experimental treatments. I have discussed 
details of such matching procedures elsewhere (/y) and will not 
repeat them in full here. I wish only to hit some critical points which 
still are sometimes overlooked in current investigations. 

In the first place, matching must be on a relevant task, skill, pet" 
formance, or whatever is involved in the research. This means simply 
that the matching scores must be related— correlated— with the pct" 
formance that is measured during or following the introduction 
of the independent variable. Just how high this relationship must be 
is again a question which cannot be given a general answer for the 
same reason as discussed in Chapter 2 when response reliability was 
considered. Some investigators still have a tendency to assii77ie that 
tasks are correlated without a statistical demonstration of the cor- 
relation. If the matching and experimental tasks are not correlated we 
are forced to resort to an assumption of randomness which will 
usually be a highly questionable assumption since subjects are lost 
during the matching procedure. Matching subjects on intelligence 
before introducing them to differential treatments in a rote-learning 
task is questionable but it has occurred rather frequently without 
any apparent concern that intelligence test scores and rote-learning 
skill may be poorly related. 

We have performed some studies on concept learning (e.g., lO 
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differing events associated with Army versus Navy experiences. 
And 1 think that if such a difference were found we would be justi- 
fied in reaching this cause-effect conclusion although we would 
realize that the particular nature of the differences in experiences 
could not be specified just from the data we have. This hypothetical 
experiment, in short, would meet our standards of design. But now 
let us see what is actually done in these ex-post-facto experiments. I 
will use as illustration an experiment concerning the length of time 
spent in Boy Scouts as it related to community adjustment (4)- 
It is a perfectly legitimate scientific question and an important 
social question to ask whether or not being in Boy Scouts influences 
later community adjustment or community participation as an adult. 
In this particular research, done in 1938, two groups of boys were 
separated based on the number of years spent in Boy Scouts through 
1934, at which point all boys had terminated their association with 
the Scouts. One group had spent an average of 4 years, another i -4 
years in Scouts at time of termination. In 1938 these boys were 
measured on several factors related to community adjustment, com- 
munity participation, and so on. It was clearly the intent of the re- 
search to establish a causal relationship benveen length of time in 
Boy Scouts and community adjustment. Obviously, not all boys 
were available in 1938 so the investigator resorted to a matching 
procedure on the boys who were available. (The problem of general- 
ization of research findings is a matter we shall consider later.) The 
groups were matched on several factors, and even though the investi- 
gator did not report the relevance of the matching variables to com- 
munity adjustment, this is not the principal point I wish to make. 
Are we to assume that the fact that one group spent 4 years and the 
other 1.4 years in Boy Scouts is due to sheer chance? I think we 
could agree that this would be highly improbable. The groups must 
have differed on one or more variables which are responsible for one 
group’s being in Boy Scouts only 1.4 years in one case and 4 years in 
the other. If, then, we measure these boys 4 years later, we are prob- 
ably measuring the continued influence of these factors and perhaps 
not at all any influence of different lengths of time spent in Boy 
Scouts. In short, wc do not know that, and we do doubt that, the 
two groups of boys were random samples from the same population 
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DESTRUCTION OF EQUIVALENT GROUPS AS A 
CONSEQUENCE OF RESEARCH PROCEDURES 
Under this heading I want to consider a number of different ways 
by which the equivalence of our groups may be vitiated by experi- 
mental procedures which produce a loss of subjects. I shall consider 
enough samples so that we can be sensitized to the fact that these 
loss-of-subject situations may be varied and lethal. In general, the 
situations to be discussed have first used random or matched groups 
before the introduction of the experimental variables. We ask the 
question of what effect does the experimental treatment, producing 
loss of subjects, have on our results. 

I. Suppose we were going to do a study on speed of rote learning 
as a function of meaningfulness of the materials to be learned. In the 
simple case we would construct lists which in so far as we can tell 
differ only on meaningfulness. Again, to keep it simple, let us con- 
sider only two levels of meaningfulness, hence two groups of sub- 
jects, one assigned to one list and one assigned to the other. Suppose 
we use a performance criterion, say, one perfect recitation. Wh^t 
we expect to find is that the list of low meaningfulness takes longer 
to learn than the list of high meaningfulness. Inevitably in such 
studies we find that some subjects will be unable to reach the 
criterion; they are unable to learn the list to which they were as- 
signed. Let us assume that the first subject which comes to the 
laboratory is assigned to List i, the second to List 2, the third to 
List I , and so on. I think we would accept this as a method of assign- 
ment which should result in equivalent groups if factors such as 
time of day, experimenter, and so on, were equalized. When a sub- 
ject fails to learn we assign the next subject to that list and proceed 
as if the subject had not been lost. We complete the experiment and 
discover that the uyo groups did not differ in terms of mean number 
of trials to learn the two lists. We might conclude that meaningful- 
ness as manipulated here was not a significant variable. On second 
thought, even with this brief description, I am sure that no one 
would accept such a conclusion without additional data. In this 
particular case we would certainly express an interest in the number 
of subjects who were lost for failure to learn each list. If the number 
of subjects unable to learn each list was roughly the same we would 
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tors are essentially equalized for all groups. In the particular study 1 
am using for illustration (/), one of the experimental variables was 
the amount of weight on a Skinner-box lever. Thus, groups of rats 
had to work differentially hard in order to be rewarded. A common 
criterion of performance was imposed on all groups. It was dis- 
covered that the harder the rat had to work (more objectively, the 
bigger the weight he had to push) the less likely was he to reach the 
performance criterion. Hence, the loss of rats was directly related to 
size of weights. More and more rats were added until all groups had 
the same number of rats which had completed the task. But, it would 
appear that the groups now are no longer equal on all relevant vari- 
ables. The heavier the weight the greater the likelihood that the rats 
remaining had greater strength or skill in bar pressing. If the investi- 
gator wished only to conclude that differences in original perform- 
ance would result from differences in weight his evidence is over- 
whelmingly positive. However, in this case, extinction measures 
were taken and it is quite possible that these extinction measures 
represented a confounding of differences in ability and differences in 
the experimental variable. 

3. A number of studies have been performed on learning as a 
function of age. There is opportunity in these studies for selection 
of subjects as a function of age so that the results may be biased, 
especially at the upper-age levels. A study investigating this relation- 
ship may be done about as follows. The investigator goes into a 
community and takes a sample of the people in various age ranges, 
say, 6-10, 1 1-15, 16-20, and so on. To each group he administers one 
or more learning tasks and then plots learning as a function of age. 
Results of such studies have been fairly consistent, namely, learning 
performance increases up to about the age of 20, remains fairly level 
to about 40, and then shows a very gradual decline. Of course, the 
shape of the curve varies as a function of the particular task but 
the relation indicated has some generality. Now it seems clear that 
the results are straightforward as far as the conclusion between age 
of those subjects used and learning is concerned. If the sampling 
from each age range is random, the curve is representative of the 
populations of each age range. However, I think there is a real doubt 
as to whether it represents only a relationship benveen age per se 
and learning. Let us sec why this might be. 
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5. As I mentioned earlier experimental conditions which destroy 
equivalence of groups may take place in a great variety o 
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male versus female subjects. So, we divide our total group into sub- 
groups of males and females with the intention of comparing reten- 
tion. We should realize that in doing this these groups should not 
differ significantly on relevant variables for retention, e.g., degree of 
learning. Let us take another illustration which shows another facet 
of this problem. 

Assume we did a study on spatial generalization (e.g., 5). The 
subject is faced with a row of seven lights. He is told that each time 
the center light comes on he is to press a key as quickly as he can. 
But at various times we light lights other than the center one to see 
if the subject responds. After a large number of trials we can plot 
the number of responses made to each light and this would be 
expected to show a decreasing frequency of response from the cen- 
ter light out on both sides, i.c., a gradient of spatial generalization. 
Another measure of generalization that might be used is the latency 
of response. More specifically it is expected that the latency gradient 
will be roughly a reciprocal of the frequency gradient, i.e., the more 
generalized the response the longer the latency or to say it another 
way, the fewer the responses the longer their latencies. 

In such an experiment as this we might calculate the mean latency 
for responses to each light independently (using the number of 
responses as N). If we did we would probably find that our expecta- 
tion was not supported; that is, the more generalized responses might 
not have longer latencies than the less generalized responses. Indeed, 
the more generalized responses might even have shorter latencies. 
But, we would note that this was an inappropriate means of handling 
the latency data because we may have a subject selection process 
involved. Each subject is not represented at each light so that 
those subjects who did respond to the extreme lights may have very 
fast latencies, those who didn’t may have very long latencies. So, 
when we calculate means based on all responses at each point (for 
each light) we are using subjects for the different points who have 
different natural ’ latencies. The proper way to handle such data 
would be to use the latency for the center’ light (correct light) as 
a reference point and figure deviations from that for each subject 
for the generalized responses. In any event, we simply cannot get 
mean latencies for each position because of the subject selection 
which this method introduces. 
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relevant subject variables. Now, if the same subjects are used in 
more than one condition when a task or environmental variable is 
being manipulated the same rule holds, namely, the subjects must 
not differ on relevant subject variables when different conditions are 
presented them. But it is a fact that as a result of having one or more 
experimental conditions the subjects do differ when presented a 
subsequent condition. There is no satisfactory way to prevent these 
subject changes; therefore, if the subject is to serve in more than one 
condition the experiment must be designed so that these changes in 
the subjects will not differentially influence conditions when the 
orders of conditions for all subjects are considered collectively. 
I call these changes in the subjects progressive errors and the method 
of handling them is some form of counterbalancing. Perhaps “pro- 
gressive errors” is a misnomer; actually the term refers to the in- 
fluence of behavior changes which occur as a consequence of 
continued experience with successive samples of the same class of 
materials or tasks. These behavior changes are usually said to be 
the result of practice and fatigue and arc often referred to as practice 
effects and fatigue effects. It is an empirical fact that if a subject per- 
forms or practices on tasks which are relatively new to him, his 
performance will improve with continued practice. It is also a 
fact that sustained performance on a task may lead to decrements in 
performance and such decrements might be attributed to fatigue. 
In experimental work we can usually avoid any appreciable change 
attributable to fatigue by limiting the experimental time at any one 
session. It will be a rare situation, however, in which the investigator 
can say with confidence that there were no behavior changes at- 
tributable to practice. It is therefore extremely important that we 
recognize these experimentally irritating behavior changes which 
occur with successive practices; they will bias or distort the be- 
havior which we wish to attribute to the manipulated variable unless 
our scheme of conditions is so arranged that the changes (whether 
increments or decrements) will fall equally on all conditions of the 
experiment. 

In many kinds of research problems the investigator has a choice 
as to whether he will use the same subjects in all conditions of an 
experiment or whether he will use a different group of subjects for 
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extraneous stimulation on perceptual judgments. The judgment re 
quired was of verticaUty of a rod. This rod was luminescent and 
was presented to the subject in a dark room. The rod was someomes 
tilted left, sometimes right, and the subject directe e^ expen 
menter to adjust the rod until he (the subject) judged it to be 
vertical. These adjustments were made under five different condi- 
tions. In one condition the subject was given a nuld shock to the left 
neck muscle while directing the adjustments of t e ro , m ano e 
he was given a shock to the right neck muscle. In a third condmon 
he received mild auditory stimulation through f ® f 
making the judgment and in a fourth condition e stimu ® 
in the right ear!^ Finally, the fifth condition was a control in wteh 
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available in other sources, I shall no more than mention them and 
make some brief evaluative statements. 

1. Complete counterbalancing, in which each condition occurs 
equally often at each stage of practice and each condition precedes 
and follows all other conditions. 

2. Some form of Latin square or systematic randomization, in 
which each condition occurs equally often at each stage of practice 
but all conditions do not precede and follow all other conditions. 

3. Randomization, in which the order of conditions for each sub- 
ject is assigned at random. 

So far as I can tell, there is little to choose between the first two 
methods, except on practical grounds of number of subjects avail- 
able and number the investigator wants to include. The number of 
subjects required for complete counterbalancing is r factorial, where 
r is the number of conditions. When we reach five conditions 120 
subjects are needed, and with six, 720. With Latin squares or deriva- 
tives thereof the number of subjects is simply some multiple of the 
number of conditions. 

The third method (randomization) probably should not be used 
when the number of subjects is small. One need not mortgage his 
soul to randomization in this case because the same effect as ran- 
domization can be achieved by ^scematically ordering the condi- 
tions by cither of the first two methods. Strange as it may seem, 
randomization of conditions may be most easily justified when 
(as m many sensory experiments) data from a single subject con- 
stitute the entire data from a series of conditions. However, in these 
is given many trials under a single condition so 
that effectively it is as if many subjects were used and each given 
one trial on each condition. For example, if there are two conditions 
and 100 trials for each condition, randomization of the order of the 
200 trials should result in the effective balancing of progressive 
errors. It would be equivalent (for balancing purposes) to giving 
100 subjects a single trial on each condition in which the order of 
the nvo conditions was determined on a random basis 

So much by way of background. Turning now to specific re- 
search, I want to give you two published illustrations of the failure 
to balance progressive errors. 

An experiment (17) was concerned with the effects of certain 
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with continued trials. If it did here we have the effects falling differ- 
entially on the conditions and we, therefore, do not know whether 
the differences among conditions as measured represent the effects 
of the conditions manipulated or the effects of these pr^ressivc 
changes or both. I think that in view of the direction of differences 
actually found the investigators might vyell argue that all differences 
could not be accounted for by progressive errors but it seems to me 
they would have a difficult time defending the proposition that the 
magnitude of the differences as found were not differentially m 
fluenced by progressive errors. However, no such defense would e 
necessary if the precaution of balancing the order of the conditions 
had been taken in the first place. In this case the balancing coul 
have been accomplished within the conditions for each subject or 
among the 40 subjects which were used. This same failure to balance 
for progressive errors occurs in a second experiment by the same 
authors (/p) but in a third experiment {i 8 ) the balancing is iiiccly 
accomplished by a partial balancing within a subject’s conditions 
and a further balancing among subjects. 

The second investigation (/o) which I wish to discuss as an illus- 
tration of failure to balance progressive errors may be more serious 
than the above largely because the tasks used are known to produce 
large progressive changes in performance as a result of practice. 
The investigation was concerned with the retention of verbal hsts 
as a function of two variables, namely, the length of list and degree 
of meaningfulness of items in the lists. Length of lists was varied 
four ways, namely, 10, 20, 30, and 50 words in a list. There were 
eight levels of meaningfuiness to which I will refer with the num- 
bers I through 8. In the procedure used, a list was presented to the 
subject, one word at a timei and immediately after the last word the 
subject wrote down as many words as he could remember. There 
were 20 subjects. All subjects learned and recalled all 32 lists (four 
lengths with eight levels of meaningfulness for each length). Th® 
order of presenting the lists, exactly the same for all 20 subjects, was 
as follows; 

10 words long, all 8 lists in the order i through 8 

20 words long, all 8 lists in the order 1 through 8 

30 words long, all 8 lists in the order 1 through 8 

50 words long, all 8 lists in the order i through 8 
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There are probably a number of reasons why experiments involv- 
ing manipularion of environmental factors to discover the telanon- 
ship with relatively permanent subject characte^cs have been few 
in number. Undoubtedly, one factor is that they do req^e k 
periods since the assumption is that these re ative } ^ ^ 
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sents no new general problems. So, therefore, it is diffLent 

digm, where Ejects are inirially separated on the 
ai^nts of a cha’racterisdc and then tested m a 
am concerned with in this section. Contrarj to m) p j 

previous and subsequent sections, the ^cussion “ 

pository with little reference to . j, ,m recog- 

Wh7n we manipulate a subject variable I 

nlzed that we are dealing essenmUy vnh^the^^ ^ 

when we wish to impute r ^jeai-ch. My approach 

using response correlauon as an in^ t ^ 

to the problem wUl be somewhat different tl^ in x 

since I^’am intere^ed m conc.mmns svhmh - ^^^“f^trk out 
Viewed in terms of the design of the rijrprtlr with the 

,hoe T' "“a th, ianldJ.., aM- 

thctical problems to show the ismes in ^ 

and then we shall search for so ouo difficult one that we 

preview that the analjvical researches dictated 

can really only solve it by ^enesrfm-Ioamg^^^ 

by working hypotheses. Th"^ variablesTo fundamental behav- 
from the attempt to relate subject , u re in the sense of 

ioral phenomena. I use tiie word 

well-established phenomena straightforward empirical 

As a surting point let us c ^ of chronological age to 

Study 'which relates the subjett ^ rcorcsents difficult)’ of 

rigidit)’. (In a Uterar)' sense, high ngiditj represents 
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which take place in the subject’s abilities as a consequence of the 
experimental treatments. There is a host of experimental situations 
in which we must deal with biases which the subjects bring to the 
situation. If these biases can favor one condition over the other and 
if we are not interested in the biasing effect, steps must be taken to 
eliminate these differential effects. Or, wc might express this by say- 
ing that subject variables (biases) arc not equal for two or more con- 
ditions of treatment and we want them to be equal. However, I find 
that these biases are usually involved in a confounding manner be- 
cause of certain task characteristics. Therefore, I prefer to wait to 
discuss experimental errors resulting from these biases when I con- 
sider confoundings by task variables. 


CONFOUNDING BY SUBJECT VARIABLES WHEN 
MANIPULATING SUBJECT VARIABLES 

In Chapter 2, I indicated two general ways by which subject 
variables may be investigated. First, we may manipulate conditions 
to discover what influence such manipulations have on specified sub- 
ject characteristics. Thus, we might vary the amount of preschool 
training to see what influence this has on mental age. Secondly, we 
may choose groups of subjects who differ on some specified dimen- 
sion (unitary or complex) and test for other differences in behavior. 

The first method, that of determining causal factors lying behind 
individual differences on particular characteristics, actually involves 
the manipulation of environmental variables. Therefore, issues which 
have been discussed and others which will be discussed later con- 
cerning the manipulation of environmental variables will app^/ 
this paradigm. If one wants to be really fussy one can say that the 
manipulation of an environmental variable is undertaken to discover 
the influence of the variable on subject capacities or skills. However, 
I have kept this research situation separate, not because it poses 
peculiar problems, but because it is concerned with modifications 
of relatively permanent skills and capacities and, furthermore, the 
research usually extends over long intervals of time. In contrast, the 
association one has when thinking of the typical manipulation of 
an environmental variable is a short-term study relatively uncon- 
cerned with modification of permanent skills of the subject. 
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changing habits, low rigidity quite the opposite; in the actual ex- 
perimental situation several different techniques have been used to 
give rigidity operational meaning.) To do the research we sample 
different chronological ages and get measures of rigidity for the 
sample at each age. Suppose we found a positive linear relationship 
between age and rigidity. What do wc conclude besides the fact 
that age and rigidity are directly and linearly related? Perhaps we 
would not care to conclude anything else; after all, we have laid 
bare a relationship and science is a search for relationships. Of course, 
if we do obtain the relationship as indicated we have a new way to 
measure or diagnose age. That is, knowing the rigidity score for a 
person we can tell his chronological age. The error in our estimation 
of chronological age based on rigidity scores may be somewhat 
greater than we would obtain by examining birth records or asking 
people what their ages are, but nevertheless we can predict age from 
rigidity scores. If I seem facetious it is only to emphasize the weak 
nature of the conclusion at which we have arrived by our single 
piece of research. As scientists engaged in a perpetual attempt to 
reduce relationships to basic cause-effect relationships, we would be 
quite discontented with stopping our research at this point where 
all we know is that there is a positive relationship between rigidity 
and chronological age. We know that chronological age is merely 
a convenient dimension of time and we must look for other changes 
which occur with time. But let us get away from this illustration for 
a moment to show that it is not an artificial one. 

One of the few substantial findings in the area of problem-solving 
is that men do better on such problems than women. No scientist 
that I know of is content with this finding; rather, it raises the prob- 
lem as to why men are better. Some might suggest that there is some 
genetically based difference which leads to the behavioral difference 
and then set about to search for this genetic differential. Others may 
attempt to relate this to experiential differences, thus relating it to a 
process about which we already know. The important point is that 
differences in behavior related to subject variables only start re- 
search, for in the typical case these differences must be related to 
more fundamental behavioral processes. 

Let us turn back to the rigidity illustration. Having related rigidity 
and chronological age, we note that up to a certain point mental age 
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schizophrenia and rigidity? (I shall not consider problems peculiar 
to this illustration such as the matter of making contact with the 
severe cases, since I am using this problem only as an illustration of 
a general design issue.) First we must satisfy ourselves that a differ- 
ence in rigidity was not one of the criteria used in making the 
diagnoses of severity of illness. For, if this is true, and we find differ- 
ences in rigidity on our cxpcrimcnLil task, all we can say is that 
performance on the experimental task confirms the reliability of the 
diagnosis and perhaps adds a little to our idea of the generality of 
rigidity in the individual. (This situation parallels the illustradon 
given in the previous chapter concerning “effect” of length of time 
spent in Boy Scouts on community adjustment.) Without going into 
any detail, what we want to have is a working hypothesis concerning 
the relationship between degree of schizophrenia and rigidity sug- 
gested by the implications of the syndrome but where the degree of 
rigidity was not used to sort out the two groups. This latter is quite 
possible if the diagnostician can clearly specify the criteria used in 
making the sorts on severity. 

The second design problem is more difficult to solve satisfactorily. 
We want the two groups to differ only on severity of schizophrenia. 
How do we accomplish this? When manipulating task or environ- 
mental variables we have (as one technique) used random assignment 
to accomplish the equalization of factors other than the one we are 
varying. We might think at first glance that we could do the same 
here. We could take a random sample of each of two large groups 
(one diagnosed as mild the other severe) on the assumption that 
this would equalize for other factors. Very quickly, however, a 
second glance will show us that this could be a lethal procedure. 
Suppose the two groups differed in age with the severe group having 
an older mean age. Suppose further that we actually knew that 
rigidity and age were positively related. If we find a difference in 
rigidity between our two schizophrenic groups we would quickly 
realize that this could well be independent of the severity of schizo- 
phrenia. So, what do we do? 

It would seem that we would have to turn to a matching pro- 
cedure. So, we first match on chronological age. But this only? Well, 
no, perhaps we should match on sex, mental age, socio-economic 
background, racial background, education, length of stay in hospital, 
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have been attacked with some degree of success, certain working 
hypotheses have been first advanced and then the implications o 
these hypotheses explored by research. More specifically, these 
working hypotheses relate the siihject variables to jiindainental be- 
havioral processes. Again let me say by fundamental behavioral proc- 
esses I mean processes which have been investipted independently 
for which there are laws with environmental variables, and for whic 
there is evidence that they permeate a wide range of behavior phe- 
nomena. I would include under this idea of fundamental such proc- 
esses as maturation, learning, forgetting, inhibitory processes, moti- 
vation, and so on. 

The investigator starts out with an idea or hypothesis that differ- 
ences in subject variables (such as differences in severity of schizo- 
phrenia) reflect the operation of more fundamental processes. VVhat 
he says, in effect, is, if this difference is due to this or that process 
(or a combination) then he would expect this (difference in rigidity) 
to obtain. He is applying a set of principles of behavior about which 
we already know considerable to another area of behavior other thm 
that used to derive the principles in the first place. If a reasonable 
number of tests of the implications of the application are positive 
,we then begin to accept the original hypothesis which identified 
differences in a subject variable with a difference in a more funda- 
mental process. 

The studies on manifest anxiety, originally stemming from Iowa, 
have taken this approach. The fundamental hypothesis advanced was 
that differences in anxiety represent differences in drive. If this is so, 
then according to what was known about drives and their theoretical 
elaboration, such-and-such should happen in certain situations. This 
is also the approach taken by Eysenck (e.g., 8), Eysenck first (by 
factor analysis) obtains what to him are general descriptive dimen 
sions of personality. Then he asks himself what fundamental proc- 
esses lie behind or cause individual differences on these dimensions. 
For one dimension he suggests that differences in inhibition, as 
inferred from classical experimental research, may be involved. He 
then proceeds to test for differences in magnitude of empirical phe 
nomena (believed to reflect differences in inhibition) for individuals 
who score differently on his descriptive personality dimension. Crnn 
bach and Meehl (<J), if I understand them, almost reach this position 
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place we have arrived at what I have called previously the pitifully 
weak conclusion that we now have a new technique— performance 
in a conditioning situation— to diagnose anxiety. Secondly, and this 
is far more important at this stage, wc are liable to the criticism that 
anxiety may not be the critical variable involved. It is like the rela- 
tionship between chronological age and rigidity. Perhaps the groups 
differing in anxiety also differ in age, intelligence, learning ability, or 
in a great many variables that might influence performance in the 
conditioning situation independent of anxiety. Basically we are at 
the same vulnerable and helpless point that wc were in our study of 
schizophrenia and rigidity. How do wc get our groups equivalent on 
all variables except anxiety? The answer is we don’t. We might 
match on factors which are known to influence performance in the 
conditioning situation and perhaps on other factors if it is convenient 
to do so. Of course, if we do accomplish such matching and suddenly 
the relationship between performance and anxiety disappears we 
have identified the critical variable as something other than anxiety. 
The whole pattern of research would change at this point. But, let 
us continue the illustration by assuming that matching leads to no 
change in our results— performance is still related to anxiety. (It may 
be noted parenthetically that by such matching procedures we are 
obtaining a great deal of information about what subject variables 
are irrelevant to performance in a conditioning situation.) But we 
have asserted and still must assert that we cannot be confident that 


we have eliminated all variables as possibly more basic to the rela- 
tionship than anxiety. That is, some variable which is somewhat cor- 
related with our measure of anxiety may still be responsible for our 
empirical relationship so that if we knew what this variable was and 
held it constant while still varying anxiety our relationship would 
disappear. There is no completely satisfactory solution to this 
dilemma. But what we do is push the implications of the drive 
hypothesis through a series of experiments, exploring many possible 
implications. If the results rather consistently parallel those obtained 
when other drives are manipulated our confidence is substantially 
increased that we are justified in relating the results obtained from 
this variable (anxiety) to our body of knowledge concerning other 
drives. We thus gradually remove anxiety as a behavioral phenom- 
enon requiring independent theoretical solution. It will be explained, 
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differ on the characteristic. For example, in the case of anxiety, we 
might choose two random groups from the same population and 
attempt to experimentally induce high anxiety in one group and low 
in another to see if our results on some task confirm results based 
on selecting different groups having more or less permanent differ- 
ences in anxiety. If the results do conform, we have in a sense 
eliminated the possibility that our results obtained from permanent 
anxiety groups could have been a function of some unknown factor 
being partially correlated with anxiety. Perhaps two or three tests 
would be necessary to be confident of this conclusion but it would 
shorten our program of research considerably if the results were 
positive. If the results are negative only slight doubt is cast on the 
original hypothesis. For, our experimental conditions designed to 
introduce differences in anxiety may be inadequate or, if judged 
adequate, the experimental anxiety may not serve as a drive in the 
same sense as the anxiety “naturally” present in different amounts in 
subjects. Nevertheless, we should always examine the situation to 
see if it is possible that we might experimentally introduce differ- 
ence in subject variables. Obviously, there are many subject vari- 
ables in which this is not possible, \Ve would probably find it diffi" 
cult to experimentally change the chronological age of our subjects 
and there would probably be objections from certain quarters if we 
tried to experimentally induce different degrees of schizophrenia in 
originally normal subjects. Such situations sometimes lead to work 
on lower animals, which may then be used to support inferences on 
the human level. 

^ In the above discussion I have indicated that before the identifica- 
tion of a subject variable and a basic behavioral process can be 
made, several confirming tests must occur. I have also suggested 
that confidence in the identification is increased if relatively unique 
predictions can be made and confirmed. Thus, in the case of anxiety, 
the expectation (based on drive theory) is that in a situation where 
there is little interference high-anxieiy subjects will be superior in 
performance to low-anxiety subjects but the reverse will be true if 
the interference is high. Nevertheless, this whole identification proc- 
ess has a danger component in it. Behavior can only change in an 
upward or downward direcrion as a consequence of manipulation 
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will also not solve the problem, for potentially we should have to 
match on every possible variable known. Tlie matching problem is 
not solvable but even if it were we would arrive at a weak conclu- 
sion scientifically, namely, that if we get a difference in performance 
we have discovered another diagnostic tool. 

(b) Since we cannot match on evcrj’thing, if we proceed with 
the research anyhow we arc running a real risk that differences 
which we may obtain are a function of a partially correlated subject 
variable which we have not identified. 

3. The best solution seems to be to not try to match except on 
variables which arc known to be relevant to the task or others which 
if not known to be relevant, arc easily handled in a matching situa- 
tion. Then, we attempt to relate or identify the subject variable 
under investigation to a more fundamental behavior characteristic 
about which we already have considerable laws and see if our ex- 
pected relationship holds for this subject variable. A single experi- 
ment is relatively worthless in this context. We normally would 
expect a number of positive tests before we feel confident that our 
original hypothesized identification is tenable. 

4. For some subject variables this research may be accelerated by 
the use of experimental manipulations which temporarily induce 
different amounts of the subject characteristic. 

In completing this discussion of the problems involved in manipu- 
lating subject variables, I want to consider the implications of nega- 
tive results in this type of research. When manipulating an environ- 
mental or a task variable, whether or not the results per se are posi- 
tive or negative with regard to the variable is a matter that has little 
bearing on a judgment whether or not the design of the investiga- 
tion was sound. That is, if I manipulate an environmental variable 
and get a positive relaticnship between my variable and behavior, I 
accept this relationship only if I can perceive that no confounding 
variable has operated. If I get negative results i.e., no relationship 
between the variable and behavior, I would likewise accept the re- 
sults if I can perceive no confounding. 

When a subject variable is manipulated the major problem, as 
discussed extensively above, is what to make of positive results. I 
have said that when a positive result is obtained we do not know 
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CONFOUNDING BY ENVIRONMENTAL VARIABLES WHEN 
MANIPULATING ENVIRONMENTAL VARIABLES 


In the previous chapter, I discussed 
confoundings by subject variables when manipulating environ- 
mental, task, and subject variables. These confoundings are identified 
with cells 3, 6, and 9, respectively, in the table on page 91 of the 
previous chapter. I now want to consider cell 1 of this table. This 
cell refers to confoundings by environmental variables when ma- 
nipulating environmental variables. The treatment of this topic 
involves a little preparation by way of indicating somewhat more 
specifically the nature of the problems which arise. 

The problems centered around use of control groups arise almost 
exclusively in research where an environmental variable is being 
manipulated. The pure case of the control group is one in which 
the subjects of this group have not been given any experimental 
treatment; their behavior is then compared with that of another 
group (the experimental group) which has been given experimental 
treatment. To be unnecessarily precise 1 suppose we must say that 
you can’t give a control group zero treatment; the subjects in this 
group do not exist in a vacuum while the experimental group is 
being treated. But in a practical sense the control group does receive 
a zero amount of the environmental variable when the effect of such 
a variable is being studied. The point I wish to make is that in con- 
trast to this situation, when we are manipulating a task variable we 
rarely have a group which is given zero amount of treatment. Some 
examples may shape the contrast. If we are manipulating meaning- 
fulness we don’t have material which has zero meaningfulness. If ^ 
condition is called zero meaningfulness it is called this only with 
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CONFOUNDING BY ENVIRONMENTAL VARIABLES WHEN 
MANIPULATING ENVIRONMENTAL VARIABLES 


In the previous chapter, I discusse 
confoundings by subject variables when manipulating 
mental, task, and subject variables. These confoundings are identifie 
with cells 3, 6, and 9, respectively, m the table on page 91 
previous chapter. I now want to consider cell i of this table. T is 
cell refers to confoundings by environmental variables when ffia 
nipulating environmental variables. The treatment of this topic 
involves a little preparation by way of indicating somewhat more 
specifically the nature of the problems which arise. 

The problems centered around use of control groups arise 
exclusively in research where an environmental variable is 
manipulated. The pure case of the control group is one in w ic 
the subjects of this group have not been given any experimenta 
treatment; their behavior is then compared with that of anot e 
group (the experimental group) which has been given experiments^ 
treatment. To be unnecessarily precise I suppose we must say t 2 
you can’t give a control group zero treatment; the subjects in t 1 
group do not exist in a vacuum while the experimental group^ 
being treated. But in a practical sense the control group does 
a zero amount of the environmental variable when the effect ° 
a variable is being studied. The point I wish to make is 
trast to this situation, when we are manipulating a task varia c 
rare y have a group which is given zero amount of treatment. ^ 
examples may shape the contrast. If we are manipulating 
fulness we don’t have material which has zero meaningfulness, 
condmon IS called zero meaningfulness it is called this only 'Vi 
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report dearly implies that the change in hostility resulted from 

crepancy bettveen tvhat was concluded ‘^her 

be concluded. In evaluating such experimen perceive other 

than this in the ““ Xfd 

environmental factors which con a p j j present 

as the one attributed to the he University of 

experiment, for example, the su jecB liberal tradi- 

Chicago for the specbl ‘-“''"^^^ren’ «hnic groups is in fact 
uon involving attitudes towar rhanpes olwerved may have 

exemplified at this university t en . . , therapy, 

been due to the assimilation of 'his “a factors which would 

While in this case it may be possi r„nv\ one is not obligated 

produce the change in this reseU. 

to do this in order w ^ ,,00 for reasons we cannot 

Changes in ethnic hostility may t P .r jj (q give a control 
identify; the only way oTtherap^ Only then 

group the two testings without the 33 ^f therapy. From 

can we attribute differences to „ conclude anything 

the experiment as it now stands P ^^oond test showed less 

more substantial than that 'h' condition for the change 

hostility than those on the first. forward I might say that 

canno?^be specified. By wj o rH. ®^pe w^iil he 

Other issues concerning analyses y 

discussed later. , j^finirion of reminiscence in the 

2. In discussing the operauona investigators carried 

previous chapter, I indicated tha ^ a control group. The 

Lt research^on this topic leafninj trials, 

basic procedure was to give the 1 , . ^ second retention test 

then an immediate retention t« , , . „gj. interval with which the 

after an hour, 24 hours, a ^ outcome of the second retention 

investigator was concerned. “ * , . have occurred, 

test was superior to the first, u-nnened between the first 

The implication was that “some^ng “something” 

end second retention test that enhanced the recall. 
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ations, you will have to tolerate some repetition between the present 
exposition and the one given when discussing E/C definitions. 

FAILURE TO USE A CONTROL GROUP 

Boring (iJ) has reviewed the history of the use of control groups 
in psychology. He points out that the first use of a bona fide contro 
group was the classical Thomdike-Woodworth study on transfer of 
training in 1901. Since that time there has been a gradual increase 
in the frequency with which control observations are used. This 
increase might be accounted for because of shifts in research areas 
in which the phenomena require contro! groups for adequate defi- 
nition. But it also might be expected because of the progressively 
more analytical nature of a science as more and more phenomena 
are discovered, and as stimulus complexes are broken down mto 
more unitary dimensions. Whatever the historical correlates are to 
this trend, at our present stage of methodological sophistication 
regarding the necessity for the use of a control group when estab- 
lishing the reliability of a phenomenon based on environmental 
manipulation, it is discouraging to find reports in recent literature 
where there is complete failure to use a control group of any kind. 
Let us be sure we understand the seriousness of this situation as 
disclosed in actual research reports. 

I. The purpose of one study, as given in the introduction to the 
report, was to determine the effects of a series of group therapy 
meetings on ethnic hostility { 20 ). It is not our concern here whether 
or not the instrument used to measure hostility was adequate. The 
principal points of the procedure were simple. The 24 subjects were 
first measured on an ethnic hostility scale. Then, for six weeks they 
participated in a client-centered counseling training program. As a 
part of the program group therapy sessions were held with a trained 
therapist in charge. The total time of such sessions was about 35 
hours and this was the manipulated variable of the research. At the 
end of the six-week period the subjects were again measured on 
hostility by the same instrument used at the initiation of the pro-* 
gram. A comparison of the initial and final scores shows that subjects 
exhibited less ethnic hostility on the final test than on the first. The 
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4. For some reasons, investigators using various procedures de- 
signed to reduce the distress of the mentally ill have been particu- 
larly myopic toward the use of a basic control group to determine if 
the therapeutic procedures produce positive results. This has been 
true in some cases where electroshock has been used (e.g., 7) and 
where frontal lobe operations are employed (e.g,, 12). We shall 
later see an illustration of the use of an inappropriate control group 
in lobotomy research. ^Vhen we attempt to evaluate the influence of 
face-to-face talking therapy (e-g., //), we should by now realize 
that no conclusions concerning the effect of therapy per se can be 
achieved unless appropriate control group or groups are used. I do 
not think our literature should be cluttered with these anachronistic 
procedures even if editors and authors recognize and publicly admit 
these shortcomings, for the shortcomings are fata! in a scientific 


sense. 

I do not wish to spend more time on this matter; the failure to use 
a control group is such an obvious error that our exposition of it 
should not be extended. We have much more ground to cover and 
the errors yet to come are for the most part more subtle than the 
simple failure to use a control group. But, it may be asked jf there 
is any time or any situation in which a control group is not nee e 
to establish the relevance or nonrelevance of a single c ^ ^ 
condition. Undoubtedly there are such situations, although I have 
been unable to think of any in which only one or two observations 
of behavior are to be made. However, let us take a coup e o situa 

tions in which wwny observations are recorded an see 1 you ^^u 

not agree that an error in our conclurfon is unlikely even t oug v e 


have failed to include a control group. , , • 

Suppose we have a group of adults who by stan ar testing p 
cedures have been classified as imbeciles for zo years. ^ , 

20 years they have been tested and each year the nlaced 

no appreciable change. Then, a new drug (•■am.-.mbec.le ) « {daccd 
on rhe market. On one day all members of the group 
injecrion of the drug and the nc« day the rest scores al all u uhm 
the normal range. Although it is remotely po«.b e 

other than the drug caused the change, it is hig y " J 
one would care to defend this posirion strongly. The -o jenrs 
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was not referred to as an environmental factor but rather as some- 
thing spontaneously occurring within the subject. But, as was sub- 
sequently pointed out by other investigators, the first retention test 
may have served as a learning trial; thus, the degree of original learn- 
ing was not the same for both retention tests. Time and degree of 
learning are being confounded. Obviously a control group which 
took the second test immediately after the first was needed in order 
to determine whether or not a pseudophenomenon was involved. 

3. Next, let us consider a study in the area of frustration (4). The 
basic design of the experiment followed the order: test, period of 
blocking, test again. The subjects were children and a special meas- 
ure of behavior called comtructiveness of play was carefully worked 
out. Essentially the measure represents maturity of play and is quite 
highly related to intelligence. In the first observation period, the 
subjects were raced for constructiveness during a 30-minute frcC” 
play period. Then a period designed to induce frustration was in- 
serted. Following this the subjects were put back into the original 
play situation and again constructiveness of play was measured. The 
results show that the average constructiveness age of plsy in ti'® 
second test period was considerably below that of the first and it 
was concluded that this resulted from the treatment designed to 
induce frustration (a more complete analysis and criticism of this 
experiment has been published, 10). 

To my way of thinking a control group is mandatory for such 
an experiment, this group having the two test periods but not having 
the blocking. It may well be that constructiveness of play in the 
second period would be considerably reduced even without the 
period of blocking. (It is interesting to note that the investigators 
in this research have, for other research situations, used an explana- 
tory concept of satiation. This concept would roughly predict a 
decrease in constructiveness merely as a result of continual e.vposure 
to the same situation.) Note that wc cannot say that blocking h^d 
no effect on constructivcncss but neither can wc say that it did. 
More precisely, in line with the previous chapter, we would say that 
frustration had not been demonstrated, hence was not defined. Tins 
is the intolerably ambiguous situation which use of a control group 
would obviate. 
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SYSTEMATIC VARIATION AND THE CONTROL GROUP 

In some research, conclusions of importance can be reached even 
if no control condition is used provided the experiments involve 
manipulation of a variable so that it is tested at several points. As an 
example, suppose that I performed an experiment on transfer of 
training in which degree 
or amount of first-task 
learning is varied by 
several steps, say, 5, 10, 

15, and 20 presentations 
of the first task. All sub- 
jects are given the same 
number of trials on the 
second task and my basic 
measure of transfer is 
performance on the sec- 
ond task. Note that I 
do not have a control 
group, that is, a group 
which had no trials on 
the first task. Assume 
we have completed the 
experiment and obtain 
the results indicated in 
the upper part of the 
accompanying figure. I 
can conclude with con- 
fidence (merely reflect- 
ing the graph) that for 



these marerfals performance on the per^ 

m proportion to number of f transfer increases 

formance directly with transfer I w y g ^ strictly 

directly as number of trials on the first ,I,J. second 

speaking, I do not know "Aether my mca^ decreasing 

task reflect increasing amounts of p . . , jf j had a control 

amounrsof negative transfer or somecombmauon.jnM^^.^^ 

group which had no trials on the firs 
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observation essentially provide the control condition. One might 
suggest that to be on the safe side an injection of a harmless liquid 
should have been given to some of the members as a control but I 
for one would not push this. Such patients have had plenty of oppor- 
tunities to receive placebos of all kinds and yet their intelligence had 
remained consistently at the imbecile level. 

Even with much less extended observations a similar extrapolation 
of previous performance can be used as an effective control. I refer 
to less extension in time although not necessarily a less extended 
number of observations. For example, if we give a group of subjects 
a long series of trials on the pursuit rotor under massed conditions 
and then insert a short rest interval, performance after the interval 
will improve remarkably. Had we retained the massing, that is, had 
there been no rest interval, we would have obtained what we would 
consider a control score. Obviously, we can’t give the same group 
both conditions so that if we need this control score we would have 
to run another group to determine how much change took place 
with and without the rest interval. However, the extended series of 
trials before the introduction of the rest interval allows us to predict 
with high accuracy what the score would have been without the rest 
interval. The 20 (let us assume) massing trials, for which we have a 
score on each, serve the same purpose as the intelligence tests given 
each year for 20 years in the above problem. In short, we can deter- 
mine with very little error what the performance would have been 
without the rest interval merely by extrapolating beyond the 20 
massed trials. Thus, we can use a single condition— the experimental 
condition— and arrive at the same conclusion at which we would 
have arrived with the usual control condition. Control measurements 
arc not overlooked in these cases; rather, they are obtained by pf®" 
jeering a stable performance curve to predict what would have hap* 
pened had the same conditions been retained. Of course, justification 
for the projection depends somewhat on the stability of the perform- 
ance curve and background information concerning the phenom- 
enon under investigation. In any case, do not let my fervor to make 
you sensitive to the need for control groups obscure the realization 
that there may be a few situations in which the control may not be 
needed. 
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the additional control is needed) 

behavior occurs, the specific causa nuruose of the research 

with a specific phenomenon when it w P j you a 

to so id'entify ’ihe condition J tJs are W 

number of illustrations since deficiencies 

■"““ct,., .. . (■>>.""• isrSwS; 

learning is that the stimuli. An implication of 

ent responses are to be attach degrees of similarity among 

the theory is that if there are different dep 

stimuli, learning rate will be inve J e d many times. But, 
similarity. This implication has been apparent degree 

another implication is that if stim malting discrimina- 

of similarity, and subjects Nation) before the learning task 

tions among the stimuli to be attached, learning 

is instituted in which different resp supported this 

will be facilitated. Several studies ( .g-. 7. ^ presenting the 
expectation. Briefly, discrimination learning until he 

stimuli to the subject in some sort subject is then 

clearly can differentiate one tatrach new responses to each 

given a new task in which he in these experi- 

stimulus. The control g™",P . n experience; it is given only 

ments is not given the prediffere conditions appear as 

the final test task. Diagrammaacally. the two 

TestTask? 


predifferentMiion? 

Control; 

Experimental: 


Yes 

Yes 


' n the test task shows the experi- 

f comparison P“f‘’‘™'"“thc control group, it is attribute to 
■nental group to be superior 

-j;xr exocrience. , rurried out 


enm^group to be superior to the control 

the predifferentiation adequately earned out 

If the details of the above “ ,h=re^s no denying the 

:as they have been in these due to the prediffer- 

lonclusion that differences in * ^ . onclusion is not the one at 

mtiation experience. But thn 8“' Aments arrived. The conclusion 

which the investigatoK ^don among stimuU as being t e 

referred to the specific di 
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sign of the transfer. As it stands, I do not know whether in absolute 
magnitude the performance indicates negative, positive, or negative 
for some points, positive for others. I do know that I have a variable 
which significantly influences the amount of transfer. 

But now let us suppose the results from this experiment were 
as depicted in the lower half of the figure. I think the first tendency 
is to say that according to this graph transfer is not related to degree 
of first-task learning. Of course, we could say that between 5 and zo 
trials of fitst-task learning there is no relationship with transfer. But 
if we had a control group we might find that all points show negative 
transfer, positive transfer, or zero transfer. It may be that transfer 
is related to degree of first-task learning up to about five trials, after 
which it levels off. In any case, the control group greatly broadens 
the conclusion which can be reached. (In this particular illustration, 
by careful planning it is possible to get control measurements by 
using performance on the first list; but in many such experiments 
this is not possible.) 

So much for this matter; it will be a rare case when one or more 
control conditions or groups do not add appreciably to the conclU” 
sions of the experiment even though a systematic exploration of a 
stimulus dimension has been undertaken. 

FAILURE TO USE APPROPRIATE OR 

NECESSARY CONTROL GROUPS 

In research where no control group is used, and where there is 
essentially only one treatment, the data may demonstrate a signifi- 
cant change in behavior from pretest to the posttest. The ambiguity 
lies m the fact that it is impossible to tell whether the change resulted 
from the conditions inserted by the investigator or from some factor 
or factors which occurred between the two testings. Even if 
change is found from the first testing to the next, a conclusion that 
the variable is ineffective is not completely acceptable because a 
variable influencing behavior in a contrary way may have operated. 

In the cases now to be discussed, the error is not one of failing to 
use a control group but failure to use an aopropriate control or 
failure to use an additional control group that is necessary to arrive 
at the conclusion desired. The control group is inappropriate (or 
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and H control patients. The control 

the point of Is performed. Menrbers 

oftre'lpXental’ group had various parts of the frontal lobes 

rendered ineffective by surgical procedure. 

Many, many tests were given ■ ® the pretest to 

the operation and some differCTces m ®p. ^^5 subsequent 

posttest were noted between the two 8 ™ P ’ ^ 5 the results, 

'discharges tended to favor “P"^ were not 

while not strongly recoinmendmg the operativ p 

completely negative to it. fmntal-Iobe operation was to 

The purpose of this experimen P^ j ;5 

determine the effects °f j, jy „ot fulfill its purpose 

a fair evaluation to say that inadequate. The assault 

because the control-group large or small it may 

Upon the skull or skull cavities, n . oroup should have been 

have been for members of the o*P“' aU conditions should 

reproduced in members of the co 8 P j^gept the surgical 
have been exactly the same <°”\"™Sntal group. Only by 
contact with the frontal lobK for ' the influence of 

such a procedure can a conclusio control procedure was 

frontal-lobe cutting. That this approp various forms of 

not used is a little ^s^cted in many hospitals and it 

shock on mental illness was bei g ^ . surcical shock may be a 
would have been reasonable to a matter of fact, 

counterpart of other forms of s , together suggesting that 

scattered evidence could have been influence, it is due to 

if the operative procedure has any *PP , cutting of the 

the shock accompanying the operatic 

lobes. . field of research and again 

3. We turn next to a quite procedures were used but m 

evaluate a study where certain co _ phenomenon dealt with 

which a critical control is '”**^*"^ A -Jencv for objects to appear 

was assmiilathn. By this is m«nt ^vAich they belong- I shall 

like the typical object of the c Interest of brcvity- 

simplify the conditions of the tachistoscope and the subject 

A fi^re was flashed ^„\Tsaw. The figure might be 

Was asked to draw immediatel)- ' 
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factor which produced the facilitation in the performance on the 
test task for the experimental group. That is, the results were taken 
to support the theory that the predifferentiation experience reduced 
the effective similarity among the stimuli. As we now view this 
situation it is believed that the use of this control group is in- 
adequate to support the conclusion. Again, the discussion must refer 
back to priority of concepts. The predifferentiation experience given 
the experimental group may have allowed for the operation of two 
well-established phenomena either of which might have produced 
the facilitation on the test task. One of these phenomena is caUed 
learmng-how-to-learn or practice effects, and the other, ivarm-np. 

he predifferentiation experience may have allowed these phenom- 
ena to operate so that the transfer to the test task may not be the 
resu t at all of the discrimination presumably set up among the 
stimuli by the experience given the experimental subjects. The 
appropriate procedure would be to give the control group pre* 
differentiation experience on a task in which the stimuli were differ- 
ent from those used in the test task. Thus, the idea is to allow 
L 'varm-up effects to influence the perform- 

^ equally for both groups. This leaves only the 

th-it if stimuli as the difference between the groups so 

f , ^ ormance of the experimental group is better than that 

lle niT be allotted to the experi- 

And we particular stimuli used on the test task. 

twn a new phenomenon over and above 

control, h ^ phenomena. Actually, avhen appropriate 

Ti of prediifer- 

co"™! hinr-t f'') in nthers (c.g., 3). Of 

crouDS it tvo M n *f one wanted to use three control 

fhtee^factr n ’’'.P'’f“' detennine how much each of the 
is contrihiitin ^^^'n&‘how-to-leam, warm-up, predifferentiation) 
T on^rf P"f'’™n“ on the test msk. 

vears attemnreH research projects of postwar 

whether or Snr ° ^ conclusive ans^ve^ to the question of 

psychiatrists, and surgeons' ^/.em^TaV" eT^pS 
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unwarranted for it has not been demonstrated by 

To do this, some sort of a control group .s needed “"*.ch ms^ra 

tions of “what to expect” are given, but ether W 

on the screen, or (i) a figure ™“ely nrrelevant to to ^ 

flashed. I cannot be sure which would be the pp p ^ 

without preliminary btove tot something 

would be ideal if the subject could be mto to bei 

besides just a flash was being flashed. The P. 

subjects in the experimental group e fboueh they saw 

might have drawn exactly what they did raw different from 

nothing, dr even though^ they saw tofc«^ 

what they were told to “P"^^ ‘’be fnstruetions and what toy 
been little or no interaction between percep- 

saw on the screen, that is, there coul a investigators ate 

tual assimilation. The conclusions ^ ^“b wfy as sug- 

questionable until a control group treated in some such way g 

gested is inserted in the design. manipulated environ- 

4. I suspect that time is the most r q Also, it occurs 

mental variable in ali of P®>'‘^!'® -n -jve four illustrations of 
frequently as a confounding variable. I will give 

the types of errors which are made. . developed a theory 

first design is toi^^rhaf a‘ noS stols 'given imme- 
of transfer which predicted tha crudely, blot 

diately after learning the first tas wo , transfer than if no 

out” to first task so that *“0 'VO«'d b;J“be theory predicted 
noxious stimulus were given, tu grimulus, the less the 

that the longer the “rest” following interested in and to test 

effect. It was this latter prediction 
it he used the following three groups. ^ 

Group I; Task Aj raucous buzzer for i ^ ,5 minute; Task B 

Group II: Task A; raucous buzzer for mmu g 

Group III: Task A; raucous buzzer for . mmu 

The theory would thus predict '"^‘‘^rGroup H than for 
Group III than for Group II. and greater 

Group I. T-jisk A and Task B differs 

It is quite clear that t^e ^ „ojninal control group for 

for the three conditions. Group 
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perceived as one of two objects which had rather high similarity. 
Although several figures were used, I shall use only one as illustra- 
tion. A figure was used which might be perceived as a lima bean or 
as a canoe. The subject was first presented this for a very short 
exposure period, namely, lo milliseconds. But, after this presentation 
he was asked to draw what he saw. The figure was then presented 
for successively longer exposure times, the subject drawing what he 
saw after each exposure. Finally he was shown the figure for as long 
as he wished and was asked to draw it as faithfully as he could. 
He was then asked to draw a figure which would look as much like 
a lima bean as possible and one which looked as much like a canoe 
as possible. Judges then rated the drawings made under tachisto- 
scopic presentauons by using as reference points the three drawings 
made without time limit of presentation. 

The two critical conditions were: 

Control: Simply told to draw what they saw following each tachisto- 
scopic exposure. ® 

Experimental: Told that they would be shown a canoe (or a lima 
Bean) before the series of increasbg length of tachistoscopic 
procedures wjs started. 


The critical comparisons consisted in -whether or not the subjects 
*■ would see a canoe (or lima bean) drew a figure 
‘'ke a lima bean) than those who were told 
Zt did. That is, the subjects who were told 

bean t ^ drew figures more like a lima 

a told nothing and their figures were more 

those who were told they would be shown a 

tiTTif in ^ drawings decreased somewhat as exposure 

time increased but not markedly. 

nrv fcsults IS that assimilative mem- 

;nZrtT.,ic are supposed to take place over long retention 

^ ^ ^ ^ in reproductions dra\vn immediately after 

• i' P^^ception is assimilated or modified to represent 
of the class to which it belongs when it is pre- 
^ 3n am iguous fashion (as in the tachistoscopic method). 

« k ? osion assumes an interaction between the instructions of 

what to expect and the figure flashed on the screen. But this is 
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6 . Another study (/y) manipalating the 
origin in an aspect of psychoanalytic theory wh.ch ' 

traLa is one cLe of neuUis,This parucnlar e out to^see 

if trauma could be reduced in intensity by a B exposed to 
small doses. What this reduces to is that if the sub) ^ j^re 

a trauma-producing situation there will be less 
is by distributed practice than by massed practice. T J 
zi puppies about .6 weeks old. The turn 

a box which was so small that the puppy co ^ f|,e 

around. Some pilot observations indicated t a measured by 

box produced considerable agitation an t “ number of yelps 
numLr of movements the animal 

emitted. Subjects in the massed gr P concerned this is 

minutes in the box; as far as the time j;s„ii,uted group had i 

the control condition. Those subjects m ^ ^ 

minute in the box, i minute out, i ’ 

total of .0 minutes had been spent m b°x- igcantly 

the number of yelps made by the m P^ Number 

greater than the number made by th 

of movements did not differ Jpprecia y. nstjjutes any sort 

1 am not concerned here 'vhether or not^tte con 
of test of psychoanalytic theory. N miffht be questioned 

or not the box was ''■=‘'‘™‘P''‘’‘'“Tf’ the experiment because 
because three subjects were 4 f»PP operational grounds there 

they voluntarily entered the box. ’ ,Pcing situation. My con- 
is justification for calling this a tra p distributed condition 

cem is with other matters. Associated w h the^m 

was the fact that the P“PP'“ handling alone may have been 

in and lo in being taken out. Th control in which the 

responsible for the difference in behavior ^^^^ty; or, com- 

handling occurred with massed prac i . „j-oup might have 

plete avoidance of handling in the «P“^„men?s on the nvo 
been accomplished. Another matter is inten’als from the start 

groups were not taken at „roup had a total of 20 

of the experiment. Thus the d ..jtuation, the massed group 

minutes in the general cxpcrimen ® ^5 3 function of time 

only 10. Perhaps adaptation to the su minutes in 

took place. If S massed control group spent 
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there is no rest after the buzzer. Assume that the results show what 
was predicted by the theory. Can we attribute the differences to time 
after buzzer? It doesn’t seem so. Perhaps transfer would show the 
same relationship if the buzzer had not been present in any of the 
conditions; time may be the effective variable, not time after buzzer. 
It would seem necessary to use three control groups in which no 
buzzer was given but in which the time between Task A and Task B 
correspond to each of the intervals in the three experimental groups. 

5. In one experiment the investigators {28) wanted to determine 
the influence of different lengths of delay of knowledge of results on 
accuracy of drawing three-inch lines while blindfolded. Three dif- 
erent groups were used, each being given a series of trials. The 
essential aspects of the procedures were as follows; 

Group I. Draw line; given immediate knowledge; rest 10 seconds; 
draw line, etc. 

Group II. Draw line; wait 10 seconds; given knowledge; rest ro 
seconds; draw line, etc. 

ROUP III. Draw line; wait 20 seconds; given knowledge; rest 10 
seconds; draw line, etc. 


Prnn u Group I dccw a line every 10 seconds, those in 

w M ®very 20 seconds, and those in Group III, every 30 seconds. 
• ormance have varied as a function of these different 

^ertrial intcmls even without differences in delay of knowledge? 

^ it would not have, but such judgments should 

.1; . ^ ® for appropriate control groups would have 

conrem^ 'T confounding of time by time as a source of 

■with r^cl- control groups, having immediate knowledge but 

1; respectively before drawing the next 

confnni^H' ™i.i differences in time between drawings as a 
^nflu^n^ that it was a variable 

betwppn ^ ormance. You might also suggest that equating time 
b tween successive drawings for all three groups would solve the 

wnnM an interpretative problem 

^ variation in delay after drawing 

flnii ’n inverse relation between knowledge and 

^ ^ of the next line. To which time interval would one attribute 
differences m performance? 
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cal conclusions when the situation did not justify 

When a control group with no color-nammg was w th a 

task which would prevent rchear^l. differences m error frequency 

evaporated. 

SOME SPECIAL PROBLEMS RELATED TO 
USE OF CONTROL GROUP 
Much of the material to he covered in f 
three reports, one by Solomon (30), one y ^ j designs 
by HoXnd et al (a,)- I h^ve “f data and 

yield still better designs as "'^\^„£j„ent^consists largely 

a consideration of their implications. rfirnllr the conditions 
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the box, and comparisons of the behavior during the ist, 3rd, 5th, . . . 
and 19th minutes were made with the successive 10 minutes of the 
experimental group the differences in behavior might have dis- 
appeared. As the procedure was actually carried out we may con- 
clude that something which was done to the puppies produced the 
differences in behavior, but we have no basis for concluding that 
it was the difference in time per se between trials which produced 
the difference. 

7. In studies on distributed practice in verbal learning, a problem 
which has plagued many investigators is how to “fill” the rest inter- 
vals so that the subjects won’t rehearse the task they are learning. 
The desire to prevent rehearsal is understandable since it avoids 
the very type of experimental error I am discussing. Distributed 
trials are introduced to discover what effect the rest intervals as such 
have on performance; therefore, it is desirable that any superiority 
in performance under distribution not be attributed to the extra 
learning which might occur with rehearsal. One of the tasks com- 
monly used to prevent rehearsal is color-naming. The subject is 
simply given a board on which patches of paper of various hues ate 
pasted and during the interval he names the colors at a fairly rapid 
rate. Subjects report that they cannot rehearse while carrying out 
this task. 

When I started a series of studies on distributed practice a fc"' 
years ago, color-naming was introduced as a standard rest-interyal 
filler. A finding which persistently showed up in these first studies 
was that subjects serving in distributed conditions made more overt 
errors in learning the tasks than did those subjects learning under 
massed conditions; this was true even though learning was actually 
more rapid under distribution than under massing. I took this differ* 
ence in error frequency to have considerable importance for it was 
relevant to existing theories. However, subsequent research 0 ^) 
showed that this difference in error frequency was due entirely to 
the naming of colors and not due to some subtle process taking place 
during the rest intervals as I had been trying to make out. Certainly 
the difference in error frequency ivas caused by the distributed 
conditions but I had been blind to the fact that what was being used 
in the distributed intervals to prevent rehearsal had been responsible 
for the greater tendency to make errors. I had been drawing analyri- 
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_.s we have the control group, no bias will attend our results as 

consequence of such a factor. , 
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since both groups have both tests, this practice should not bias the 
results in favor of one group more than the other. 

2. A second source of change from first to second testing is extra- 
experimental experience. Thus, if in the above experiment subjects 
happened to attend a sociological lecture in which the topic was 
“race prejudice,” and if they attended it between pretest and post- 
test, the posttest scores might be influenced by this experience. Here 
again, however, unless the experimenter has evidence that more in 
one group attended the lecture than in the other, no bias is present 
although clearly some of the change from pretest to posttest may be 
accounted for by this. If two groups in any research have differen- 
tial extra-experimental experiences between pretest and posttest, it 
is quite clear that we could have very biased results and arrive at 
quite inappropriate conclusions as far as our experimental manipula- 
tions are concerned. 


I wish to digress for a moment at this point. I think it is a truism 
that the longer the interval between the pretest and posttest, the 
greater the probability that extra-experimental experiences will in- 
fluence the results. Nevertheless, there may be many cases where the 
experimenter wants to do long-term studies; that is, studies in which 
the interval between pretest and posttest may be several months or 
even years. Furthermore, he may wish to sample the time dimension 
at various points throughout the long interval. For example, if m 
the above experiment the attitudes of the experimental group had 
changed to a greater extent than the control, he might wish to meas- 
ure the permanence of the change by testing again, say, after six 
months. In any such studies over time, the control group must be 
niamtamed in order to assess the influence of the experimental vari- 
a e. If the subject matter is such that repeated testings of the same 
group IS inadvisable, then as many experimental and as many control 
groups as there are time intervals must be used. Extra-experimental 
experiences become increasingly important as the time interval 
grows longer. 

3. It is possible that changes may take place between pretest and 
posttest due to intra-organic growth processes of the subject. As 
Campbell points out, this could be quite true in young children 
where neuro-muscular growth occurs relatively rapidly. But, as long 
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treatment. For certain studies a fourth group having only the post- 
test will be included, but details of this can be found in Campbell’s 
and Solomon’s articles. 

If one is not interested in the interactive effects of the pretest but 
is interested solely in the effect of the experimental treatment as a 
means of generalizing the results to the population from which the 
samples were drawn, then clearly we should omit the pretest and 
simply give one group the experimental treatment, the other group 
not, then measure for differences. The use of the pretest may in 
many instances be no more than a useless practice that has grown 
up over the years. If subjects cannot be assigned at random to the 
groups then, certainly we need some means for checking their 
equivalence and thus the use of the posrtesr has groum up as fairly 
standard practice even if it is quite possible to assign at random. 
Furthermore, if the pretest is used, and if interactions bettveen it 
and the treatment occurs, then the design which adds the group not 
having the pretest (in order to “take out” the interactive effects) 
certainly must rest on some assumptions concerning randomness of 
assignment to groups. 

The whole point of this section, to repeat what I said initially, is 
to give a brief demonstration of how control groups, perhaps 
several in a single experiment, may be added for the purpose of 
pinning down and isolating effects of specific factors within a com- 
plex of factors. 

CONFOUNDING BY TASK VARIABLE WHEN MANIPULATING 
ENVIRONMENTAL VARIABLE 

It is my belief that errors which fit this category are infrequent. 
At least, there is some reason to believe that it should be true. The 
reason is that when manipulating an environmental variable the 
common procedure would be to use the identical task for all con- 
ditions. Yet, there are a few problems which do arise in such research 
and we should sample them. 

I. I have previously discussed the problem of the balancing of 
progressive errors. Such balancing is also necessary to equalize for 
task differences when manipulating an environmental variable and 
when each subject is to serve in all conditions. In the area of verbal 
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learning, •whether dealing with nonsense syllables or prose passages, 
we have no way of putting the units together into a series of tasks 
and obtaining any guarantee that the tasks will be equivalent in 
difficulty-. The relative difficulty of the tasks can only be determined 
by an empirical test. If we do not make this empirical test, we have 
no alternative but to counterbalance the environmental conditions 
equally over all tasks so that differences in difficulty (if these exist) 
will not bias the behavior for any one condition. We may have many 
ratings on verbal units, e.g., affectivity, mcaningfulness, familiarity, 
but when these units are put together in lists, the lists may differ 
in difficulty even though the ratings on individual units are equated. 
At least such tasks differ in difficulty frequently enough so that we 
cannot proceed with research without using some form of counter- 
balancing. I shall later have more to say about this problem of 
handling task dimensions in experimental design. Let me turn now to 
a concrete illustration of a possible task confounding when manipu- 
latmg an environmental variable. 

2. To simplify the problem unmercifully, I will say that the 
investigators in this particular experiment were interested in the 
effect of shock on the learning and retention of verbal materials (5). 
A serial list of 15 nonsense syllables was used; 5 of these were fol- 
lowed by shock each time they were presented, 10 were not. The 
learning was carried to a performance criterion and it was found 
that the shocked syllables were learned more rapidly than the non- 
shocked. Was this difference due to shock versus no-shock? All 
subjects had the same 5 syllables shocked. If these syllables were 
less difficult than the other 10, the same results would have been 
found without shock as were found. If the shocked syllables were 
more difficult than the others, the findings minimize the difference in 
performance as a function of shock versus no^shock. We have no 
way of knowing what to conclude from this experiment. The issue 
could have been resolved by using a control group which learned 
the list without shock or by systematically changing the shocked 
and nonshocked words from subject to subject so that, all subjects 
considered, no bias would have occurred. 

3. A research problem which exists in widely different fields 
revolves around subject biases. Features of experimental tasks may 
provoke responses reflecting these biases and unless these features 
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are adequately balanced, the biases may influence performance under 
one condition more than under another. These biases are commonly 
called constant errors. In psychophysical experiments many of these 
have been named (e.g,, space error, movement error, habituation, 
and so on) and are, of course, phenomena for study by research in 
and of themselves. But, unless the investigator is interested in these 
constant errors elicited by the task presented the subject, he designs 
his experiment so that they will not influence one condition more 
than another. 

In animal experimentation these constant errors are a continual 
headache and rather extreme steps must be taken to prevent them 
from differentially affecting the results for conditions of the experi- 
ment. We may think of these in another way. Certain features of a 
task may have more “cue value” to an animal than others; that is, 
because of certain experiences or because of genetically determined 
reasons, all features of a task do not have equal probability of being 
attended to. If the investigator wants the animal to attend to a 
particular set of cues, all other cues must be balanced so that they 
will not bias the results-so the investigator may state precisely the 
nature of the task presented the animal. Let us take an example; let 
us suppose that we are going to determine the influence of the effect 
of magnitude of reward (an environmental variable) in learning a 
black-white discrimination. To simplify the problem, assume we 
have two reward magnitudes, small and large, and a different group 
of rats for each of these two conditions. A jumping stand is used 
as the apparatus in which to conduct the experiment. Since we want 
the rats to learn on the basis of black-white discrimination, we have 
several balancing procedures to accomplish if we want to get an 
unbiased estimate of the influence of the magnitude of reward on 
the black-white discrimination problem. 

First, our rats may not have equal propensities for black and white 
initially. Assume that we ignored this and arbitrarily chose to put 
the food for both groups behind the white card. Assume further 
that the rats had a strong white-going bias. Our results might show 
very rapid learning under both conditions and we might conclude 
that magnif - reward was not an effective variable. Actually, 
the animal My dldn’*- anything new; they simply exe- 
cuted a hn- ■ >c situation. To avoid such a possi- 
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are adequately balanced, the biases may influence performance under 
one condition more than under another. These biases are commonly 
called constant errors. In psychophysical experiments many of these 
have been named (e.g., space error, movement error, habituation, 
and so on) and are, of course, phenomena for study by research in 
and of themselves. But, unless the investigator is interested in these 
constant errors elicited by the task presented the subject, he designs 
his experiment so that they will not influence one condition more 
than another. 

In animal experimentation these constant errors are a continual 
headache and rather extreme steps must be taken to prevent them 
from differentially affecting the results for conditions of the experi- 
ment. We may think of these in another way. Certain features of a 
task may have more "cue value” to an animal than othersj that is, 
because of certain experiences or because of genetically determined 
reasons, all features of a task do not have equal probability of being 
attended to. If the investigator wants the animal to attend to a 
particular sec of cues, all other cues must be balanced so that they 
will not bias the results-so the investigator may state precisely the 
nature of the task presented the animal. Let us take an example} let 
us suppose that we are going to determine the influence of the effect 
of magnitude of reward (an environmental variable) in learning a 
black-white discrimination. To simplify the problem, assume we 
have two reward magnitudes, small and large, and a different group 
of rats for each of these two conditions. A jumping stand is used 
as the apparatus in which to conduct the experiment. Since we want 
the rats to learn on tlie basis of black-white discrimination, we have 
several balancing procedures to accomplish if we want to get an 
unbiased estimate of the influence of the magnitude of reward on 
the black-whitc discrimination problem. 

First, our rats may not have equal propensities for black and white 
initially. Assume that wc ignored this and arbitrarily chose to put 
the food for both groups behind the white card. Assume funher 
that the rats had a strong white-going bias. Our results might show 
ver)' rapid learning under both conditions and wc might conclude 
that magnitude of reward was not an cffcaivc variable. Actually, 
the animals probably didn't Icam anything new; they simply exe- 
cuted a habit they brought to the situation. To avoid such a possi- 
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two conditions, low and high meaningfulness of material. We give 
two groups of subjects a constant number of learning trials on each 
task and measure retention after 24 hours. If we get a difference on 
our retention measurements for the two conditions can we attribute 
the difference to differences in meaningfulness? We tend to say of 
course we can. Assuming all other experimental problems have been 
properly handled, the only possible cause for the differences in . 
retention measurements was meaningfulness. At one stage in the 
development of our science I suppose such a finding would have 
been accepted. But, at the present level of development, and in terms 
of the problem stated, such a finding would be of little analytical 
worth. The problem is to determine the influence of meaningfulness 
on forgetting. By the above procedure we do not know whether 
that variable influenced learning, forgetting, or both. If meaningful- 
ness influenced learning the differences we measured after 24 hours 
must be attributed to meaningfulness; but is it because of different 
degrees of learning before the retention interval or because of differ- 
ential rates of forgetting due to the intrinsic nature of the material, or 
both? There is no way to tell from such a sec of data. But, knowing 
as we do that strength of association is a powerful variable determin- 
ing forgetting, we realize that in order to determine the effect of 
meaningfulness on forgetting, strength of response must be equiva- 
lent for the two levels of meaningfulness before the retention inter- 
val is introduced. In short, such an experiment in which a task 
variable is manipulated is confounded by an environmental variable 
(degree of learning). We have operationally distinguishable phe- 
nomena, learning and forgetting, and our confounding does not 
allow us to tell which phenomenon is being influenced by the 
variable. There are several studies in the literature (e.g., a, /y) 
using various task variables in which no consideration was given to 
this distinction between learning and forgetting. Since our variables 
may influence one and not the other, we must keep our references 
clear. 

To avoid such confoundings it became rather common practice to 
carry acquisition to a given level of performance for all conditions. 
Thus, we might take both groups, learning materials of different 
meaningfulness, to a criterion of one perfect trial on the assumption 
that by so doing the strength of responses or degree of learning was 
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alpha waves, and so on, we are manipulating task variables. In order 
to assess unambiguously the influence of these task dimensions on 
behavior the environmental variables must be constant for all con- 
ditions unless it is known that the specified environmental difference 
does not influence the behavior being measured. In running a T- 
maze experiment the number of potential confounding environ- 
mental variables is very large. Rats in different groups should be 
run at the same hour of the day because of diurnal variations; the 
“smells” around the maze should be constant for all groups; tempera- 
ture should be the same, and so on. If we are using the psychogal- 
vanic response and determining its relation to varying degrees of 
affectivity of words even the humidity must be held constant for 
each condition. In all cases, of course, when 1 say “held constant” 
this does not mean that there cannot be any variation (although this 
would be desirable); it means that if there is variation in these pos- 
sible confounding variables, the variation is equivalent for all con- 
ditions so that no bias can enter. 

So much for such routine matters. T want to turn now to a rather 
subtle confounding which arises in certain research areas. 

2 . So far as I can determine, the confounding about which I wish 
to speak now would occur only in certain types of learning experi- 
ments. More specifically, these experiments have two stages, namely, 
an acquisition stage and then some subsequent test for a different 
phenomenon. These would include studies of retention, extinction, 
and transfer. There could be three stages involved, as in acquisition, 
extinction, and spontaneous recovery, and these must necessarily be 
studied in that order. When we have these two-stage (or more) ex- 
periments and we wish to determine the influence of a task variable 
on the second-stage phenomenon, we must be sure that the perform- 
ance at the end of the first stage was equivalent for all conditions. 
The source of the design problem is the fact that if the task variable 
influences first-stage acquisition it is very difficult to be sure that 
the conditions are equal on one very relevant variable, namely, 
strength of association, at the end of the first stage. Let me trans- 
late this into a concrete illustration. 

Assume that we wanted to determine the influence of meaningful- 
ness of material on rate of forgetting. In order to study forgetting 
(second stage) we must first have learning. Suppose that we have 
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is not allowed to fluctuate because it is known that at some fre- 
quencies differences in intensity will produce small changes in phe- 
nomenal pitch. The same would be true in the visual modality. If 
luminance is being varied to determine its influence on some visual 
phenomenon, hue or wave length is held constant. Two investigators 
(22) had the idea that judgment of depth would be related to the 
brightness of colors of the objects whose depth was being judged. 
To work this out in the laboratory the traditional depth-perception 
apparatus was used in which there are two rods or tubes. One of 
these is fixed, the other variable. The subject's task is to adjust the 
variable rod so that it appears to be the same distance from him as 
is the fixed rod. In this experiment, the fixed rod ^vas always gray, 
the variable was one of six colors varying in brightness. The subjects 
were each given 100 trials with the gray tube always on the left. 
While this might introduce a bit of a space error I am not concerned 
with this. The results show that brighter colors are judged nearer 
to the subject than they actually are and the darker colors are 
judged further away than they actually are. But it seems to me that 
brightness and hue are confounded in this experiment. Are the 
differences due to brightness differences, to hue differences, or to 
both. We cannot tell; to do so would require variation in brightness 
with hue constant. 

2. In Chapter 2, when discussing problems in dimensionalizing 
task characteristics, I pointed out the very serious problem of obtain- 
ing unitary dimensions. Since the reduction of task characteristics to 
unitary dimensions has not been accomplished with most verbal 
materials used in learning experiments we must be continually aware 
of the possibility that some unknown (or at least undimensionalized) 
characteristic may be partially correlated with the dimension we 
wish to mampulate. .The problem, in miniature form, is much like 
the one discussed at the end of the previous chapter; that is, the 
problem of correlated subject variables. But even if we have dimen- 
sions of units of verbal material which are unitary and exhaustive, 
the problem is not completely handled because we must place these 
units together in a task or list; these problems were discussed earlier. 
Let us see how this could disturb a simple study on transfer. 

Let us suppose we are going to use verbal tasks in a simple study 
of transfer from one list to another as a function of the similarity 
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equivalent before the retention interval was introduced. Then, after 
24 hours, differences which occur would be attributed to differential 
rates of forgetting of the material Certainly we would agree that 
this design comes nearer to equalizing the degree of learning than 
does the previous method where number of trials was constant. 
However, even this is unsatisfactory for precise analysis. The reason 
is that if the manipulated task variable produces differences in acqui- 
sition rates, the two groups do not have an equal degree of learning 
as a result of attaining the same criterion because of the different 
rates in reaching the criterion. If we measure the degree of learning 
on the trial immediately following the criterial trial we would find 
they were not equal; thus, retention measurements might still be con- 
founded by degree of learning. The magnitude of the confounding 
will be directly related to magnitude of differences in rate of acqui- 
sition. There are at least two possible solutions to this problem; 
these have been given elsewhere ($ 6 , ^ 8 ) so I will not repeat them 
here. I will say only that in the case of retention studies an adequate 
equation of degree of learning before the retention interval has 
resulted in considerable change in our beliefs concerning the in- 
fluence of certain task and subject variables on retention. 

CONFOUNDING BY TASK VARIABLES WHEN MANIPULATING 
TASK VARIABLE 

This refers to cell 5 and represents a confounding which I have 
judged not only to occur with some frequency in research but also 
to be one for which we have no pat solutions. It will be remembered 
that task dimensions may be measured along physical scales or 
psychological scales. Although task confoundings may occur when 
manipulating a task variable along a physical scale this seems to be 
less likely (and the solution easier) than when the dimension is a 
psychological one. I will give illustrations from both areas but my 
major emphasis shall be on psychological task dimensions. However, 
let us start with the case of a task dimension measured along a 
physical scale. 

I. If one wishes to determine die effect of variation in cycles per 
second of the sound wave on phenomenal pitch, all other dimen- 
sions of the sound wave are held constant. Intensity, for example, 
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the differences in difficulty are fairly great. If the differences in diffi- 
culty are large, we had better try a new experiment. 

3. I am responsible for one of the most beautiful illustrations that 
can be found of task confounding when manipulating a task vari- 
able. I have anal}^zed this situation elsewhere (^j) so shall mention 
it briefly here. This confounding was between intralist similarity 
and interlist similarity. I was interested in interactions betiveen intra- 
list similarity and distributed practice on learning and retention. 
But, at least when using nonsense syllables for the learning material 
and the same subjects in all conditions, manipulation of intralist 
similarity led to an inverse “manipulation” of interlist similarity and 
a consequent distortion of retention measurements. This cannot 
happen tvhen subjects are used in only one condition. Needless to 
say we have dropped counterbalanced designs for studies of this 
type. 

4. Here is an experiment dealing with the retention of different 
materials (^a). The problem was to study recognition 0/ three dif- 
ferent materials judged to have different values for invoking ego- 
involvemenc. Each subject was given three cards. On one card the 
subject wrote his given or first name. On a second card he copied a 
slogan, the same slogan being copied by all subjects. Finally, on the 
third card each subject was asked to copy a one-inch square. The 
subjects were divided into five groups with retention tests being 
given after a different interval of time for each group. On the reten- 
tion test, using the recognition method, the subject was given the 
three stacks of cards resulting from putting cards of all subjects to- 
gether. From each stack the subject was asked to choose his own par- 
ticular card. The results were presented in terms of correct recog- 
nition of names, slogans, and squares after varying intervals of time. 

The idea of the experiment was that the person’s own name would 
invoke the greatest ego-involvement, the slogan less ego-involve- 
ment, and the square srilJ less, and the recognition would be directly 
related to this ego-involvement. We nught object to bringing ego- 
involvement into this picture at all but it is not with that point 
which I am concerned. Take the two extreme materials, names and 
squares. Obviously all the subjects would not have the same name. 
The investigators, therefore, included at least four cards for each 
name so that recognition would not occur solely on the basis of the 



156 Psychological Research 

between the lists. We will use lo-item lists made up of units which 
have been carefully scaled for similarity so that we can clearly differ- 
entiate, say, three degrees of similarity. To simplify the procedure 
we will further assume that we use three random groups of subjects, 
one group for each degree of similarity. All groups learn the same 
list for their first list. We have constructed three second lists, one 
which has high similarity with items in the first, one with medium 
similarity, and one with low similarity. Thus, the items in the three 
second lists are different. Now suppose we conduct the experiment 
with our basic measure of transfer effects being some performance 
on the second list. If we get differences can we attribute these to 
the similarity variable? Although we have a number of published 
studies which have used this procedure, it seems to me that we can 
draw no conclusions concerning transfer as a function of similarity 
at this point. Since the three second lists were different lists they 
may have varied on characteristics which influenced their difficulty 
so that the differences observed may have been a function of this 
difficulty rather than a function of the similarity benveen the two 
lists. The second lists might have differed in meaningfulness, in 
intralist similarity, in affective tone, or perhaps other factors. 

How do we solve this problem for the transfer experiment? For- 
tunately it can be solved either empirically or by appropriate design. 
What we need to do is discover if the three second lists do or do not 
differ in difficulty when not preceded by the first list. We might do 
this by having three control groups which learn only a second list. 
It is somewhat more convenient to handle by having half of the sub- 
jects in each group reverse the order in which the two lists are 
learned. Indeed, in this particular case we could have all subjects 
learn in reverse order and then determine transfer effects on the 
single common list. This latter procedure would be satisfactory if we 
do not find appreciable difference in difficulty for our three lists 
but if we do, we are then faced with possibilities of differences in 
degree of first-task learning. However we do it, we must show that 
the lists do not differ appreciably in difficulty; if they do we must, of 
course, make adjustments in our estimates of the transfer effects 
which can be attributed to differences in similarity. There are 
several ways by which this can be done but none is satisfactory if 
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of learning. We got wide differences in error frequency and a small 
but insignificant difference in learning which was in favor of the 
guessing group, A perceptive reviewer pointed out that even this 
small difference might be attributed to the fact that tvhen guessing 
the subject may have hit upon a correct response and, having hir it 
accidentally, may have fixated it^ Had this effect been grossly ampli- 
fied so that differences in learning were significant, we might well 
have attributed the differences to something intrinsic in the process 
of making errors and not to the fact that in making more errors the 
subject had a higher probability of hitring upon the correct re- 
sponse. 

5. I have no further specific illustrations to give, but I can imagine 
a number of situations in which the confounding of task variables by 
other task variables could easily occur. Supposing we wanted to 
determine the influence of similarity of multiple-choice alternatives 
of a paper-and-pencil test on scores on the test. Could we vary 
similarity among the alternatives and keep meaningfulness, relevance, 
ef at of alternatives comparable while varying sirnilarity.- Could we 
manipulate threat-provoking capacity of prose passages and have 
those passages equal on all other factors which might affect perform- 
ance? If we vary the political slant of speeches to determine effect 
on learning can we keep all other dimensions of such material 
equivalent? Whenever we manipulate a task variable based on a 
psychological dimension we are confronted by a potentially dan- 
gerous research situation, A careful study of other possible ways by 
which the material might vary in addition to the way we want it to 
vary will prevent us in many cases from arriving at questionable 
cause-effect conclusions. 

CONFOUNDING BY ENVIRONMENTAL AND TASK VARIABLES 
WHEN MANIPULATING SUBJECT VARIABLES 

I am placing cells 7 and 8 together in this final section for there 
is little to say which has nor already been said in the previous sec- 
tions. The major research problem in manipulating subject variables 
centers around confoundings with other subject variables and this 
situation has been discussed at length. However, confounding by 
task and environmental variables may lake place in exactly the same 
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name. Nevertheless, this is a quite unsatisfactory expedient, simply 
on a probability basis. Assume there were loo subjects. With four 
of each name the probabilities of choosing the correct one by chance 
would be one in four, whereas the probabilities of choosing the 
square by chance would be one in loo. Add to this the differences 
which may have existed in color of ink, peculiarities of writing, and 
so on which would serve as cues for recognition over and above any 
idea of ego-involvement in one’s writing and we see that the results 
of the experiment just do not seem to have any bearing on the very 
real problem of motivation and retention. 

At this point I would like to add a general caution concerning 
this matter of biasing results by conditions which do not have 
equal “guessing potential.” Suppose we are manipulating a task vari- 
able (or environmental variable or even a subject variable) and the 
particular variations allow for different response probabilities based 
on guessing. But, if differing guessing potentials is not the reason 
for manipulating the task variable, we have a confounding. I men- 
tioned this matter when considering a study in the previous chapter 
but its full implication needs to be seriously considered in almost 
every experiment. We muse remember that guesses are seldom ran- 
dom; they usually reflect response biases and if our manipulations 
allow for biases of different strengths to operate we may err in our 
interpretation. Thus, if we do a study on verbal threshold recog- 
nition (e.g., 5/) as a function of frequency of usage of words or 
letters, and if the subjects are instructed to guess, their guesses are 
likely to be those letters or words with greatest frequency of usage. 
The subject may not actually see the high frequency words or 
letters any sooner than those of low frequency but if he guesses he is 
most likely to guess those of high frequency and thus appear to 
have seen them sooner. 

This guessing may come up in many situations and all we can 
really do to protect ourselves is to analyze carefully the situation to 
see if guesses can be made and to ask whether they will influence the 
conditions differentially. A student of mine once did a study (2p) 
in verbal learning in which one group of subjects was instructed to 
guess frequently and another was instructed never to guess. Our 
interest was in producing wide differences in overt errors to see if 
in turn there was any relationship between these frequencies and rate 
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because of a special research problem which is, in a sense, a combined 
statistical-design problem which I want to mention briefly at this 
point. 

In my discussion of confounding of stimulus variables. I intention- 
ally kept the designs simple. For the most part these confoundings 
are in no way eliminated or mitigated by research in which two or 
more variables are simultaneously manipulated. That is, the reason- 
ing we have applied to the simpler one-variable experiment may also 
be applied to each variable in a multivariate design. However, as 
empirical and theoretical analysis of an area of research develops, 
multivariate designs may become almost mandatory if adequate 
statistical tests of certain phenomena are to be made. For example, 
if one has an hypothesis that drives summate the design for testing 
this can be quite simple. But, if the hypothesis specifics an interaction 
say, between strength of drives and the summation function, an 
orthogonal design (in •w'hich both variabies are wanipvhted simul- 
taneously) is virtually necessary in order to make a statistical test 
of the interaction phenomena. Jn recent years it has become neces- 
sary CO make distinctions between variables which influence asso- 
ciative processes and those which influence only performance. 
To give this distinction empirical substance it is often neces- 
sary to use orthogonal designs; at least, such designs arc a very 
efficient way to provide the separation. For example, in varying 
intensity of stimuli (conditioned or unconditioned) in conditioning 
experiments the variable may be orthogonal to itself in successive 
stages of performance as a means of separating the wo components 
(e-g., 19)- 

Occasionally a variable is by its intrinsic nature constituted of 
two or more components, either of which may influence behavior. 

In order to determine how much each is contributing to perform- 
ance the multivariate design again provides a most efficient way. To 
illustrate this, consider the variable of ratio of reinforcement in 
learning studies. This refers to ratio between number of trials given 
and number of trials reinforced. Thus, we might give one group loo 
trials with reinforcement after each trial and another loo trials wirh 
reinforcement after every other trial (on the average). Behavior 
might differ in two such conditions cither because of total reinforce- 
ment received or because of something intrinsic to the pattern of 
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way as they do in cells 2 and 4. That is, when manipulating a subject 
variable, confounding by a task variable may take place in the same 
manner as confounding by a task variable when manipulating an en- 
vironmental variable (cell 2). And, confounding of a task variable 
by an environmental variable (cell 4) is analogous to confounding 
by environmental variable when manipulating a subject variable. 
As was the case for cell 4, these confoundings in cell 7 usually occur 
when there is a two-stage experiment. Such confoundings may have 
occurred (e.g., a, i^) W I see no point in reporting these since 
their solution requires no special techniques not already discussed. 


RESPONSE ANALYSIS 

The discussion of research errors thus far has been concerned 
with stimulus confounding. I want to turn now to research problems 
which may loosely be said to arise in analyzing and interpreting 
response measurements. Again, “errors” in this aspect of research 
may be minor or major depending upon the magnitude of the dis- 
crepancy between what is concluded and what can properly be 
concluded. 


STATISTICAL ANALYSIS 

None of us needs be reminded that erroneous conclusions con 
cernmg behavioral relationships are drawn as a consequence oI 
ina equate or inappropriate statistical analysis of response measure 
ments. raduate training programs for research workers in psy^ ® 
ogy are including an increasingly heavy amount of formal course 
nnl 1*^ ^^^^istical and mathematical techniques. Psychologists av 
fn llln remarkably ingenious in devising statistical proce ur 

hivp problems peculiar to our science but 

nn/i j upon techniques developed in other . 


“g®. indeed, m years ago. and e°™P^^ 


teJIepore wUh" -na change. 

which hruA current ones to perceive the amazing c a 

I mention tl. statistical errors from the present surv j 

I mentton them now in part for sake of eompletenfss and tn 
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inappropriate since such differences in frequencies would be ex- 
pected under a single condition at different stages of learning. 

2. Assume that we are interested in social interaction of small 
groups when the individuals brought together in these small groups 
are strangers. Furthermore, assume we hold the hypothesis that the 
greater the strain or hard work required of the group the higher the 
morale that develops. To test this we give three groups problems to 
solve but with the problems for one group being relatively easy, 
those for another somewhat more difficult, and those for the third, 
very difficult. Under appropriate incentive conditions the groups 
work until the problems given them are solved. We measure how 
long it takes for each group to solve its problems and, as a measure 
of morale, the number of times the word “we” is used during the 
problem-solving session. We find that rime ro solve increases with 
difficulty (as expected) and that total number of “we’s” likewise 
increases. This might seem to support our prediction but we can 
see that the longer the time to solve the problem the longer the 
period during which “weV’ can be spoken. The response measure 
should be converted into, say, number of "we's” spoken per unit of 
time worked. 

3. In a study of problem-solving (/) experimental and control 
groups were ostensibly given the same problems ro solve but by 
the nature of the variable manipulated, the experimental subjects 
would necessarily have to take longer to show that they could solve 
the problem. Yet time-to-solve was presented as a critical response 
measure. 

4. Assume that we introduced a new micrometer into a factory 
inspection system. To discover its effectiveness we have one group 
continue using the old system and another the new one. After a 
month, we count number of rejects produced by the two systems 
and find there is no difference. But unless the number of rejects is 
considered in conjunction with total number of products inspected 
our response measure does not have the meaning we want it to have. 

You may feel that the above illustrations represent errors so 
obvious that you would never make such a mistake, t hope not, but 
it is at least worth a warning to carefully inspect your response 
measure to see if it can possibly be an artifact or psychologically 
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reinforcement. In order to filter out the influence of each factor 
separately we might use a three-variable design in which magnitude 
of reinforcement, number of trials, and ratio of reinforcement were 
all manipulated orthogonally to each other. 


INAPPROPRIATE RESPONSE MEASURES 

We sometimes think that certain responses are inappropriate 
because we don’t believe they are relevant to the behavior being 
studied. I would not care to take up such criticism on strictly 
scientific grounds but I might on social-scientific grounds. For the 
linking of a response measure with a phenomenon is essentially a 
matter of definition and if our response measure is reliable, it is hard 
to quarrel with such definitions on scientific grounds provided there 
is no transgression on already defined phenomena. But, with ex- 
treme cases, quarrels can easily be instigated. If 1 defined a psychotic 
as one who can run loo yards in less than 9 seconds, I feel confident 
that many would believe I was a very fast runner. So then, the issue 
with which I wish to deal is not a definitional one for I have covered 
this earlier; rather, it is a matter of inappropriate response measures 
in the sense that differences (or lack of them) in the data from an 
experiment may be a consequence of artifacts in the measuring 
process. Some illustrations will show the kind of thing about which 
I am thinking. 

I. A study { 16 ) was done to compare rate of learning of similar 
items when groups of such items were bunched together in the list 
as compared with the case where they were scattered throughout the 
list. The two lists were presented for 14 trials. As it turned out, the 
list with bunched items was learned significantly faster than the list 
with no bunching. Number of overt errors made was presented as 
an auxiliary measure and the results show more errors were made in 
the early part of learning in the bunched-item list but fewer on 
later trials. These error differences are used to support a theoretical 
interpretation of the differences in learning rate. However, if 'the 
learning scores are adjusted to be equal (irrespective of trials), the 
error frequencies are likewise equal. In short, comparing error 
frequencies for the two conditions at different stages of leauung is 
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problems that we cannot handle at our present level of mensuration 
development. 

The second issue is one that confronts us many times but it is 
only in cases w^here it is markedly exaggerated that we pay much 
attention to it. The exaggerated picture can be obtained by examin- 
ing data collected by members of my undergraduate course in 
experimental psychology. A card is prepared on which five words 
are repeated over and over again in random order. The five words 
are red, blue, green, brown, purple. These words are printed in 
color but never in the color indicated by the word. Thus, the word 
red is printed in blue, brown, green, and purple, but never in red. 
As one task, subjects read the words on the card as fast as they can, 
the response measure being time taken to read. As the second task 
the subjects name the color in which each word is printed. As 
might be expected, the second task produces very heavy interference 
and the time taken to name all the colon on the card we used was 
about 250 seconds as compared to $0 seconds required when merely 
reading the words. For each task successive trials (once through the 
card) were given under massed and under distributed practice. For 
both tasks a slight Increase in time to read under massed trials 
occurred over a series of trials. Under the distributed conditions 
performance on both tasks improved. In the case of the highly inter- 
fering task the improvement was roughly from 250 seconds to roo 
seconds over 8 trials. For the reading task the improvement was from 
50 to 45 seconds. Superficially it would appear that distributed prac- 
tice facilitated the interfering task more than it did the reading task 
and by any conventional statistical test this would be true. But in 
terms of significance of behavioral changes is this true? Behavioraliy, 
the reading performance, being so close to the asymptote or a 
physiological limit could be improved but little and it might well 
be that the 5 seconds improvement in score actually represents a 
far greater behavioral change than does 50 seconds in the interfering 
task. What I am saying, of course, is that our response measure, 
mean change in seconds, may be reflecting horses in one case and 
apples in another. Again, I cannot offer a general solution to such 
problems; but, we must not ignore them. 
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meaningless. Some very competent investigators have made these 
so-called “obvious” erro«. Let me give you one more fictitious 
illustration by way of warning. 

5. Although we are often belabored about our preoccupation with 
differences in mean performance in our data, most of us still view 
differences in variances among groups treated differently as a dia- 
bolical caprice which forces us to take time out to adjust these dif- 
ferences so we may proceed with staristical analyses. 1 suspect our 
reluctance to make anything of behavioral significance out of differ- 
ences in variance arises from our feeling that such differences just 
inevitably occur when we have fairly large differences in mean 
performance. Yet, we should not overlook the possibility that in 
certain forms of research the difference in variance may be very 
meaningful behaviorally with or without mean differences. When- 
ever we see the possibility that our manipulated variable might 
cause some subjects to “move” in one direction and some to move in 
the other, I would say that the differences in variance would have 
psychological meaning. Thus, if we were doing an experiment on 
prestige suggestion, we might give subjects prose passages and assign 
names of well-known authors to them in some cases and not in 
others. The effect of the names might be to raise ratings of liking-of- 
passages by subjects who liked previous work of the authors and to 
lower for those who did not. In such a case our means might reflect 
no difference between well-known and unknowns but the variances 
would. 

NONEQUIVALENT RESPONSE MEASURES 

There are two issues involved here, but they converge. The first 
I shall dispose of quickly. We still occasionally find ourselves com- 
paring changes in apples and changes in horses. For example, it 
would be desirable to know how the forgetting of a pursuit-rotor 
habit compares with forgetting of a verbal habit but I know of no 
case in which such a comparison or a similar one has been made 
which will stand inspection. This matter has been discussed else- 
where (33). I do not believe there is a general solution to this prob- 
lem. We must face the fact again that there are certain research 
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scientific productivity. Essentially, I do not think the problem of 
generalization here is any different from what it is in the staid labora- 
tory study where, say, a group of volunteer Freshmen women are 
used as subjects. Yet, at the same time it seems to me that the tempta- 
tion to generalize is greater in the case of the mail-type study than 
in the case in the female-type study. My point is that we must 
recognize the limitations to generalization in both situations. 

But we do have published mail-survey studies in which the intent 
is to generalize to a population and here we run into very real 
dangers. For example, in order to develop new scoring keys for an 
interest inventory 650 inventories were sent to 650 people holding 
industrial relations positions (23). Approximately 60 per cent were 
returned. What can we make of the interest patterns of these 60 per 
cent that we also know applies to the total group and that allows us 
to develop a key for predicting success in industriahrelations posi- 
tions? 

LABORATORY VARIABLES 

I have divided the manipulable variables into three classes, namely, 
task, environmental, and subject. The most general law we could 
have, say, of relating an environmental variable and behavior, would 
be one which maintained its integrity irrespective of the task used, 
irrespective of the values of other environmental variables, and irre- 
spective of the sample of subjects. Right off hand I don’t think of 
any behavioral law that we have which fits these requirements, nor 
do I believe we will find such. What are the problems involved in 
generalizing from a set of data? 

Subject variables. The matter of gencralfeing to a population from 
a sample as discussed above refers, of course, to subject variables. As 
is well known, the bulk of the relationships derived from laboratory 
work in psychology are literally applicable only to college students 
or to wWte rats. To be very accurate we would even have to say 
that these laws may not be applicable to coUegc students in general 
since there has been no systematic sampling. And white rats of cer- 
tain strains have been favored over others. 

As far as I am concerned it is a perfectly legitimate enterprise 
to study white rats per se without any intent of generalizing beyond 
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GENERALIZATION OF FINDINGS 

With or without the aid of theory, the long-term purpose of 
research is to develop general laws or relationships. These laws sub- 
sume the particular; they envelop the detailed findings. Such laws 
might seem to be the inevitable outcome of science but this is not 
quite the case. Rather, degree of generality of laws is determined by 
research directed by judicious consideration of sampling problems. 
I will discuss three somewhat different aspects of these problems. 
Again, if errors occur, they are identified by the discrepancy be- 
tween what is concluded and what can be concluded. 


SURVEY STUDIES 

To obtain a description of attitudes, beliefs, habits, preferences, 
and so on of a specified population, the entire population or a sample 
representative of it must be measured. These sampling techniques 
have been developed to a very high level by organizations of poll- 
sters. It is not my Intent to examine these techniques, but 1 do wish 
to mention briefly the implications of survey studies which use mail 
questionnaires since such studies persistently crop up in psycho- 
logical journals. The major focus is on the nature of the conclusions 
which can be drawn. 

In one study (57) 467 questionnaires or information sheets were 
mailed to members of the staff of a technological institute. The 
questions asked each recipient to indicate how many scientific pub- 
lications he had produced, the number of technical journals regu- 
larly read, and a lot of other embarrassing questions. The total num- 
ber of questionnaires returned was 194, which is a,z per cent. The 
intent of the questionnaire was to discover what factors (e.g., age, 
training, work habits) correlate with scientific productivity. Now 
cenainly there is nothing wrong with determining correlations 
among the various indices; we simply must be very careful in our 
conclusion concerning the meaning of these relationships. We have 
no idea as to whether these relationships would obtain among those 
who did not return the questionnaire. We have two populations, one 
whose members did return the sheets and one whose members did 
not. These uvo populations may well differ on factors related to 
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these classes were held constant at different values the obtained 
relationship might disappear, might be modified, or might not change 
at all. 

The number of potentially relevant environmental variables is 
enormous; the number of actually relevant ones may be small. The 
solution to these problems may come about in two ways. First, by 
haphazard differences in environmental conditions from experiment 
to experiment and from laboratory to laboratory, a wide variety of 
settings of these environmental variables will have occurred. Thus, 
if a large number of studies on the influence of a given variable on 
conditioning has been done, and if other environmental variables 
have been held constant in each experiment but at different levels, 
and finally, if the same relationship has been found consistently, a 
posuhoc accounting of these variables attests to their irrelevancy. 
Secondly, we will have systematic attempts to determine the gener- 
ality of a phenomenon by determining how it is influenced by other 
potential environmental variables. The more variables we can change 
with the phenomenon remaining unchanged rhe greater the gener- 
ality of this phenomenon. And of course, each new positive test 
adds to our confidence that extensions to other situations are likely 
to give the same results. But in any event, there is no easy solution 
to this problem just as there is no quick resolution to the problem 
in the case of subject variables and task variables. 

Task variables. In certain respects, the path of our science toward 
attaining generalizations across tasks is more obscure than that for 
subject and environmental variables. To take a simple illustration, 
suppose I set about to determine the influence of distributed practice 
on learning, and I want to be able to generalize across tasks. I could 
pick a number of different tasks which 1 think are different (i.c., 
which involve different or uncorrelated subject skills) but my 
judgment of what constitutes different skills may be hopelessly in- 
adequate or erroneous as also may be the pooled judgments of many 
persons. The number of ostensibly different tasks which might be 
devised is almost without limit. As I indicated in an earlier chapter, 

1 have suggested at various times to my colleagues that we might 
make a start toward the soluuon of this problem by assembling a 
wide variety of tasks which ■U’C thought were different and tlien 
determining the communalitics of the skills by factor analyst. TTic 
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white rats. So also can we study monkeys, stentors, atoms, butter- 
flies, ocean waves and mountain tops. But if we say that law’s de- 
veloped from w’hite rats are applicable to college students without 
empirically testing this we are making a frightful leap. Even if "we 
insist that there vmst be some behavioral laws w’hich hold for all 
living creatures the correctness or incorrectness of our insistence 
can only be gauged by empirical studies. Using the white rats or 
monkeys as a source of hypotheses about human behavior is common 
practice (and it can tvork the other way too), but no scientific 
process th*at I know abrogates the making of empirical tests of the 
hypotheses. 

It is perhaps too early in the development of our science to be 
overly concerned about the matter of systematically determining 
the generality of laws from one species to the next. Certainly within 
the species honto sapiens we have as yet no methodical plan for cs- 
ploring the limits or generalities of relationships. We are still so 
engrossed in determining reliable phenomena and variables w’hich 
relate to them within very restricted populations that any generality 
we may have has occurred as a result of haphazard or fortuitous use 
of samples from different populations. So what can we say about this 
matter? First, we must realize that our present law’s may be very 
restricted because of the restricted range of subjects used. Second, 
our science must eventually make s)’stematic attempts to determine 
the generality of laws, not only within species, but across species. 

Enviromnental variables. To make a general statement concerning 
the effect of a given environmental variable, the prerequisite is an 
adequate exploration along the “length” of that variable. If we 
examine the influence of intensity of conditioned stimulus on rate 
of conditioning and explore the range, say, beuveen 70 and 100 
decibels, we have inadequately explored the dimension and are in 
no position to make a generalized statement concerning the influence 
of this variable even for the restricted set of conditions of the experi- 
ment. This is the first issue as I see it for this class of variables. The 
second is more difficult to bring into perspective. Having determined 
the influence of a given environmental variable on behavior in a 
particular situation, we must realize that this relationship holds only 
for the values at which other environmental variables, subject vari- 
ables, and task ^'ariables were held constant. If the variables within 
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But, one may persist, since the principles of behavior obtained 
in the laboratory may be restricted to situations in which certain 
ocher variables are held constant, how can we be sure that the 
manipulated variable which is efifective in the laboratory will be 
effective in the field situation? If distributed practice facilitates the 
learning of nonsense syllables in the laboratory can I be sure that 
distributed practice will facilitate the memorizing of a foreign lan- 
guage vocabulary in the classroom? Of course not. The issue is 
simply another manifestation of the problem of arriving at scientific 
generalizations about phenomena which are not intrinsic to a highly 
specific set of conditions. Laboratory studies give ideas about vari- 
ables which might affect performance in real-life situations, but 
the degree of confidence Avith which the generalization can be 
made from laboratory to field depends upon the degree to which 
many possible counteracting and interacting variables have been 
studied. The proof of generality lies in .a test for it; the labora- 
tory situation in this context may be thought of as a highly efficient 
means for identifying variables which may be important for the 
field, but the proof of this lies in the field test where conditions 
would be less highly if at all controlled. Not even the most advanced 
sciences can avoid this test even though the laboratory be con- 
structed to simulate as nearly as possible the field conditions. The 
laboratory is the home of science, not of technology. 
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factors would define the general subject capacities required by the 
tasks and research could then be undertaken on the influence of 
variables on single tasks which represent each factor. In the area of 
motor learning some success in this direction has been achieved {/^). 
But I am told that present methods of factor analysis are essentially 
inadequate for such ventures. If this be so, then I think we should 
develop methods- of analysis which are appropriate. In the long run 
of our science some systematic analyses of tasks must be accom- 
plished. 

As I look back over the problems of scientific generalization as 1 
have discussed them I would hope my analysis is in error and that 
the task is not as gigantic as it appears to me to be. If one measures 
his own lifetime research efforts against the known work which 
appears to lie ahead it measures but as a shovel of sand in the vastness 
of the Sahara. The saving grace is that even a shovel of grains of 
sand must be in some way representative of all sand and that the 
grains within the shovel have an undeniable fascination in themselves. 

LABORATORY TO FIELD 

Experimental research in psychology (as well as in any science), 
research in which the investigator controls variables, is an abstraction 
in the sense that it never duplicates a real-life situation. The very fact 
that variables are controlled makes this apparent. So how, one may 
ask, can we apply the principles derived in the laboratory to real-life 
situations? How can we generalize from the laboratory to the field? 
Why not do the research in the field in the first place? 

Let us first understand that the laboratory is not as divorced from 
reality as some would have us believe. Does the subject leave his 
hates, his skills, his capacity to leam in the dormitory when he 
comes to the laboratory? Does he come to a room in which there is 
no temperature, no stimulation, no social interaction? Is the learning 
of a list of nonsense syllables or the judgment of distances totally 
unrelated to what the subject docs in real life? Of course not. The 
only major difference between the laboratory and everyday life 
is that variables other than the ones in which the investigator is 
interested arc not allowed to “roam” at will. 
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which would be considerably changed by the flux which is science 
w'ere a second analysis made a few years hence. 

If the above paragraph suggests some confusion as well as the 
healthy, normal, expected changes in a vigorous discipline such as 
psychology is today, then I think it is accurate in its implication. 
Indeed, I must spend considerable time pointing out sources of con- 
fusion if I am to adequately reflect our science as it exists today. I 
will make no pretense that this confusion can be ameliorated appre- 
ciably; there are too many divergent trajectories of thought to 
expect this even if it were possible and desirable. The best I can hope 
to do is identify some of the sources of confusion and disagreement 
and try to provide some common base by which the disagreemencs 
can be described. 

/deal versus actual explanatory attempts. Throughout our discus- 
sions of problems related to explanatory attempts, it will be wise 
to keep in mind a distinccion between the ideal of e-xplanation and 
the explanatory attempts undertaken by a “typical” research psy- 
chologist, To the philosophers of science, we owe a considerable 
debt, one portion of which is for their penetrating analyses of the 
formal aspects of concepts and laws in theoretical systems. By this I 
mean the interrelationships among concepts and laws, how postu- 
lates may mediate law-like predictions through the application of 
the rules of deductive logic, the role of mathematics in theoretical 
systems, and so on. These analyses have resulted in a fairly standard 
conception of what a “good” explanatory ^stem consists. But, to 
use a trite phrase, it may be a mixed blessing to have this ideal model 
before us. The model has been largely filtered from the work in the 
older sciences, notably mathematical physics. This discipline, in 
terms of empirical and explanatory age, is so much older than 
psychology that a disservice to psychology may result if we attempt 
to emulate this model explanatory system before having reached the 
appropriate “explanatory-readiness age.” 

Funhermore, it is not an unquestionable assumption that theory- 
building in psychology must eventually follow the path of the 
physical sciences as mapped out by the philosophers of science. 
However, the relative success of the physical sciences ^\^th their 
theories, the commonality of men’s minds, and the zeal with which 
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An Overview of Explanation 
in Psychology 

INTRODUCTION 


In introducing the first chapter, I 
said that science attempts to bring about a description and under- 
standing of nature. The descriptive phase is commonly identified 
closely with the research of science. Our discussion of the com- 
ponents of the research situation, operational definitions, and experi- 
mental designs were all oriented toward this descriptive or empirical 
component of science. We turn next to the second aspect of science, 
understanding. By understanding I mean the explanatory or theo- 
retical efforts of the scientist which so often accompany the re- 
search. . 

While I may speak somewhat glibly of description and explana- 
tion as two discrete components, it is obvious that in actual practice 
no such clear separation is apparent. If science were a static affair, 
if it consisted only of a classification of stable objects, or even 
events, such a bifurcation as made above might have more value. 
This is not the case. There is a constant shifting— a continual inter- 
play between the descriptive and explanatory efforts. Explanation is 
never ultimate in the mind of a scientist. What may be considered 
adequate explanation today may be relegated to theoretical purga- 
tory tomorrow. What may seem to some to be a straightforward 
empirical relation may be raised by a theoretician to the level of a 
postulate and used (with other postulates) to explain other relation- 
ships. Thus, a cross-sectional analysis of the explanatory concepts of 
a science provides no more than a momentary picture, a picture 
*74 
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which produces such diverse opinions as do problems related to 
explanation, such as what is theory, when should we have it, and so 
on. And, I know of no other area which presents the open-minded 
student with a body of writings so difficult to get one’s teeth into 
and find there something substantial to hang on to. Theories about 
theories shift about in a most unstable and indecisive manner. 


PROBLEMS IN tJNDERSTANDING EXPLANATORY ATTEMPTS 

CoTtfusion in terminology. To a certain extent I have already 
avoided the use of the word “theory” simply because its meaning is 
so ambiguous. The ambiguity can by no means be attributed only 
to the writings of psychologists, since the philosophers of science 
and scientists in other disciplines have contributed their share to the 
chaos. Let me sample a few writings on this topic. 

Bergmann, a philosopher of science, but quite close to psychology, 
says; 

. . .as vague as the customary use of the word “theory” itself (a, p. 537). 
Campbell, an English physicist and philosopher of science, writes; 

... it will be well to start by explaining in some detail exactly what 
meaning I propose to attach to the term “theory.” I shall not assume 
at the outset that my use of the word coincides with that generally 
adopted; indeed, since I shall urge that the general use covers proposi- 
tions of widely different form and significance, I can expressly disclaim 
that assumption (7, p. 120). 

And, a statement by Stafford, a psychologist; 

.... but we are still without a precise formulation of what we mean by 
theory. Do we, all or any of us, mean by theory, one, all, or none of 
the following: implicit definition, speculation, postulate, hypothesis, 
assumption, correlation or coordination of variables, deductive elabora- 
tion, explanation, school or ^stem iz2, p. ( 5 r)? 

Bergmann, this time writing with Spence, the psychologist; 

Any attempt then to divide this hierarchy of constructs into sheep and 
goats, i.e., operational constructs and theoretical constructs, is of neces- 
sity arbitrary (3, p. 6). 
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some theorists in psychology at least try to use physical theories as 
models for their own theoretical efforts, malcc it seem likely that 
sooner or later behavior theory will follow the lead of the older 
sciences at least as far as it can (cf., Hcmpel & Oppenheim, 12, for a 
discussion of this matter; pp. 325 if.) But, to assess most explanatory 
attempts in psychology against the formal systems of the philos- 
ophers of science (as derived from physical theory through philo- 
sophical considerations) would be a waste of our time at present. 
Such an assessment would show only that most explanatory attempts 
in psychology just simply do not approach the ideal formal struc- 
ture demanded of advanced explanatory systems. For example, if 
one looks at the elegance of an explanatory system as given by the 
philosophers of science (e.g., Cohen & Nagel, S), with its formal 
axioms and deduced theorems, one realizes immediately that it 
rarely contacts reality as far as explanation in psychology is con- 
cerned. A philosopher of science is not a working scientist; he is 
not faced with the day-to-day problems of the search for some 
form of explanation in the restricted area in which most scientists 
work. If it be argued that it is not the job of the typical scientist to 
construct comprehensive theory then we might agree, for very few 
scientists of any discipline have done so. If it is argued that it is not 
the job of the typical scientist to worry about explanation in his own 
limited area of research, again there would be no argument. But, I 
would simply point out that most scientists do engage in some kind 
of explanatory attempts and it is our task to try to come to some 
understanding of the thinking of the scientist in making these 
attempts. 

But now, to return to an earlier point, we may ask at what time 
is a discipline ready for these formalized systems of explanations 
propounded by the philosophers of science? When does the age of 
“explanatory readiness” arrive? There are a number of issues rele- 
vant to the answers to these questions. I think it will be wise, how- 
ever, if we delay the discussion of these issues until a little more 
ground work is laid. It may be said, however, by way of anticipation, 
that on the basis of the assorted opinions of psychologists who have 
spoken about this matter, we shall arrive at no satisfactory answers 
to the questions. Among psychologists I know of no other issue 



Explanation in Vsychology 179 

system of ideas or facts (no matter how small the system) when this 
system allows for deductions. In a very crude sense deductions fol- 
low the pattern: “If this is so, and that is so, then this must be so.” 
It is thus a form of syllogistic reasoning, akhoagh in machemadcai 
systems it cannot be handled so familiarly. Let me sample two of 
many writers who suggest that the word “theory” should be used 
only when the deductive arrangement exists among concepts. 

Such organization of empirical laws into deductive systems is the dis- 
tinguishing characteristic of scientific theories (^, p. ^j6). 

It is almost a platitude to say that every science proceeds, more or less 
explicitly, by thinking of general hypotheses, of greater or less gener- 
ality, from which particular consequences are deduced which can be 
tested by observation and experiment (s, p. ix). 

We might then agree that when deductive possibilities exist 
among terms we have the basis for use of the word “theory.” In 
order to keep the e.xposition in this and subsequent chapters termino- 
logically immaculate 1 would be delighted to use such a criterion. 
But I can’t. While what is and what isn’t deduction may be crystal 
clear in the sciences where statements are mathematical in nature, 
in the theoretical efforts among psychologists where words rather 
than numbers dominate the science, whether or not deduction has 
taken place, or indeed, can take place is not easy to determine. 
Deduction, induction, and what might be called scientific intuition 
become somewhat confused. 

In order that a set of concepts may have deductive power, state- 
ments must be made concerning interaction among processes sym- 
bolized by the concepts. (An elaboration of the nature and meaning 
of concepts which may have this deductive power must wait until 
the next chapter.) These statements arc such as those about sum- 
mation, subtraction, multiplication, etc. of the processes. They seem 
to occur fairly generally in psychological writing. And, to me, this 
criterion of interaction comes somewhat close to being a way of 
deciding whether or not a ^stem has deductive consequences. 
Nevertheless, I have here again been unable to apply a proposed 
criterion to my own satisfaction; in some cases it is not at all clear 
that the supposed or postulated interaction process is necessary for 
the predictions which are made. In short, I think anyone would 
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Boring, the psychologist, must have despaired of using the word 
“theory” in any differentiating sense: 

In brief, a generalized description is a theory. This meaning for the 
word theory is admitted by those who discuss the philosophy of 
science, although for the most part they prefer to limit the term to the 
more complex cases, the theories that exhibit the interrelationships 
among abstract concepts. I am insisting on the broader meaning because 
I am arguing from continuity. I am saying that concepts are created by 
inductive generalization, that science is made up of confirmed relation- 
ships among concepts, not among data, that theory is so pervasive that 
it penetrates even the observational instant, when the observer decides 
whether to classify his black-white perception as 10.7 or ro,8 on the 
ammeter scale (^, p. 175). 

When Boring says that he insists on the “broader meaning” he 
means it, for he lists 1 5 kinds of psychological and scientific theories 
which range from simple functional relationships between depend- 
ent and independent variables to mathematical models. And he says: 

The difference between being theoretical and empirical is mostly a 
question of how far the process of reification of the construct has 
progressed (^, p. 172). 

But, Boring solves nothing— he may not have intended to— for we 
do not know where in the process of reification one switches to the 
use of the word “theory.” Apparently this is left to the individual 
scientist to decide. 

I need not multiply the quotations. Let me say only by way of 
addition that the word “theory” is not the only term in this domain 
to cause confusion. What may be called axioms by some {8) may 
be labeled postulates by others (/j) and hypotheses by still others 
(7). What may be termed hypothetical constructs by sotne writers 
{16) may be called transcendent hypotheses {14), inferred entities 
(/) or “fictional concepts” (17). And there are many other sources 
of conflicts (/). 

But now, let me turn back to the word “theory” and ask if there 
is any degree of agreement which it might be useful to exploit. 
There may be. By a number of writers the assertion is made (in one 
form or another) that the term is most appropriately used for a 
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introducing explanatory attempts. What reasons does the psychol- 
ogist cite for the way in which he theorizes? Let us examine a num- 
ber of possibilities as they have been suggested by various writers 
in psychology. 

1. It is a commonly observed fact about the organism that it is 
not a linear transmission system; its output— measured behavior— is 
not linearly related to the input of the environment. Seldom do we 
find straight-line functions for relationships between independent 
and dependent variables, even though we transform our measuring 
scales in a number of different ways. This fact suggests (it does not 
prove beyond doubt) that the transmission system from sense organs 
through CO effectors does something about the input; it doesn't 
“attenuate” or “amplify” at a one-to-one ratio. One function of 
theory has been said to suggest what the modifying processes are, and 
more precisely, how they modify. These guesses may be made in 
physiological terms (and so arc intended to suggest actual physio- 
logical processes) or they may be stated in strictly psychological 
terms with no specific physiological mechanisms implied. 

This function of theory, while approached somewhat differently 
from the one indicated above, has been suggested by Spence; 

Theoretical constructs are introduced ... in the form of guesses as to 
what variables other than the ones under control of the experimenter 
are determining the response (20, p. 51). 

Just how these guesses may assume acceptable theoretical or ex- 
planatory status is a matter for later discussion. For the moment, I 
wish to pursue Spence’s thinking further. He goes ahead to point 
out that these theoretical constructs are commonly called interven- 
ing variables. Then, to continue, in Spence’s words: 

If Under environmental condition Xj, the response measure Rj, is 
always the same (within the error or measurement) then we have no 
need of theory. Knowing that condition Xi existed we could always 
predict the response. Likewise if, with systematic variation of the X 
variable, we find a simple functional relation holding between X values 
and the corresponding R values we again would have no problem, for 
we could precisely state the law relating to them. But, unfortunately 
things are not usually so simple as this, particularly in psychology. 

On a second occasion of the presentation of Xi, the subject is very 
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have some trouble in applying either the criterion of “deductive 
consequences” or the criterion of “interaction of processes." 

While there is a certain amount of futility involved in using 
words which one cannot be satisfied with, we still must communicate 
as best we can. Therefore, I shall try to restrict myself in the use 
of the word “theory” to those situations in which the processes sym- 
bolized by the concepts interact so as to permit deductions. Though 
I have seriously considered dropping the word, 1 see that doing so 
would not solve the basic problem. We are continually faced with 
the term in our readings and we must make the best of it that we 
can. I must be allowed, therefore, to use the word with some loose- 
ness of meaning. 

The word "explanation” will be used in a very general sense as 
indicated below. It will be apparent that theory is one very impor- 
tant way or method of attempting to provide explanation in psy- 
chology. 

Purpose of explanation. Upon one general matter, scientists and 
philosophers of science have reached nearly universal agreement. It 
is that the purpose of explanation is to account for the greatest num- 
ber of facts or observations with the fewest number of principles or 
assumptions. I have used more alternative words in the above state- 
ment to encompass differences in usage among various writers, al- 
though I have by no means exhausted the variety of terms that have 
been used. Nevertheless, I think the objective of explanation is clear. 
The ultimate in explanation would be two comprehensive principles 
from which all relationships of the universe logically stemmed. It 
is perhaps needless to say that such a goal seems remote, indeed. 
Of far more importance for explanatory efforts in psychology is 
the fact that we work 'within limited areas 'within psychology and 
current explanatory attempts must be evaluated as to how well they 
encompass facts or observations within any one area, no matter how 
small it may be. In other words, at present the purposes of explana- 
tion can be attained within very limited areas of research and be 
perfectly valid. In the long-run development of our science we hope 
that these areas will become united by common principles, but prog- 
ress toward that end may be expected to move at a very slow pace. 

Within the general statement of the purpose of explanation a num- 
ber of more specific reasons have been given for theorizing or 
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do just what was noted earUer as a generally-agreed-upon purpose 
of exphnation, namely, account for a broad range of facts by as few 

basic principles as possible. . „„„„ 
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likely to exhibit a diHerent magnitude of response, or in the second 
example there may be no simple curve discernible between the two 
sets of experimental values. It is at this point that hypothetical con- 
structs are introduced and the response variable is said to be deter- 
mmed in part by Xi, and in pit by some additional factor, or 
factors... { 20 , p. 51)- 

In my opinion, Spence’s reason for theory as given here makes a 
very weak case. His first illustration indicates simply that there is 
an unreliable phenomenon, for he says that Xi does not result in 
a consistent response. If Xi does not produce a consistent response, 
i.e., if the reliability of the phenomenon is not established, then there 
is nothing to theorize about. (I am sure Spence does not mean in 
this case that theory is introduced to account for unreliability.) If 
he means that on successive presentations of Xi the response changes 
in a simple regular fashion (as it might in a learning situation) then 
these regular changes become as predictable as “no change” and on 
his own premise no theory needs to be introduced. And the com- 
plexity of the relationship, if it is reliable, does not change the 
ability to predict. In short, I find it very difficult to accept this 
opinion as to why theory is introduced. But, perhaps Spence’s 
opinion has changed, for in a later publication we find a somewhat 
different reason being advanced for dieorizing. 

2. Referring to the field of learning, Spence notes that if we are 
dealing with one response measure in a single type of experimental 
situation there might possibly be no need for theory because it is 
feasible that a single mathematical equation would fit the various 
curves of learning found in the situation when different variables are 
manipulated. But, if a number of response measures are used in 
several different experimental learning situations, several different 
types of learning curves may be found relating the response measure 
to the independent variables. 

Confronted with such a state of affairs, the theory-oriented psychol- 
ogist has attempted to integrate these isolated, particular sets of laws 
into a more comprehensive ^^em of knowledge by means of his 
theoretical formulations (ii, p. 153). 

It is clear that in this statement of one purpose of theory Spence is 
suggesting that within the area of learning, theoretical attempts may 



Explanation in Psychology i g 5 

apparent without the theorj\ I do not think wc should take this state- 
ment (and similar ones) as a methodological axiom. And, the fact 
that the theory suggests research docs not mean that it is automati- 
cally significant research (significant in the sense that the discipline 
is advanced more rapidly than would have been the case if the re- 
search had not been predicated on the theory), I have written else- 
where about this matter It has seemed to me that a great deal 
of effort has been wasted in attempts to test theoretical disagree- 
ments in some areas of learning, as typified by the latent-learning 
studies. The research has tended to be bitsy-type research in which 
the end product hasn’t settled theoretical matters and often has not 
left us ndth the sound sorts of relationships between vanab}es which 
supersede any theory. Skinner has recently commented on this 
matter of research based on theory as follows: 

Research designed with respect to theory is also likely to be wasteful. 
That a theory generates research does not prove its value unless the 
research is valuable. Much useless experimentation results from theor- 
ies, and much energy and skill are absorbed by them (if, p. 194). 

Brogden comments in a similar vein. 

A theory may organize the results of many researchers, it may bring 
new relations to light, or it may serve as a catalyst for fruitful experi- 
mentation. On the other hand, a theoty . . . may impede advancement 
seriously. It may fail to consider existing experimental evidence that 
does not support it; it may encourage research to proceed in non- 
productive channels; or it may define problems verbally that cannot be 
attacked experimentally (tf, p. 224)- 

It would seem then, that in evaluating the usefulness of theory, 
we would not only need consider whether or not the theory insti- 
gates research which would not have been done, but also whether 
the research so prompted can be advantageously assimilated into the 
body of knowledge. In psychology, at least, we have no grounds for 
accepting theory uncriticaiiy as a magical stimulator of research; 
neither do we have a right to condemn it simply because in the 
opinion of some it has been wasteful in certain restricted areas. 
Clearly, there are differences of opinion on the issue, and that is the 
point to be kept in mind at the present time. 
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facts can easily become so great that it is beyond the capacity of the 
human mind to assimilate them under the conditions of our culture. 
But, if a theoretical principle or principles can account for these 
facts logically, and if we remember the rules of logic, we can often 
“cue off” the detailed fact by working out the implications of the 
theoretical principles. It may also be noted that it is this very 
characteristic of a “good” theory which allows for the prediction of 
new facts, a point to be discussed later. But, even this “memory 
function” of a theory has certain dangers, for relevant facts not 
predicted by the theory, or contrary to it, may have a tendency to 
slip away. One may wonder if this is what may have happened to 
theorists who have been accused by their critics of “overlooking” 
evidence contrary to their theory. I do not wish to justify the over- 
looking of such data, but I would be interested in understanding the 
mechanisms which produce it. 

5. Another purpose of theorizing has been stated as “good” 
theory generates research. Actually, this statement may have two 
meanings. In the first place one who develops a theory or gets 
attached to someone else’s theory may get thoroughly ego-involved 
in it. This may strongly motivate to do research to “prove” the 
theory. If the research is sound, and if the relationships discovered 
stand by themselves without regard to the theory which instigated 
them, then I think we would agree that the personal relationship 
between the scientist and his theory may have had a beneficial effect 
for science as a whole. I do not know how many psychologists 
would not do research if they were not motivated by a strong affinity 
for some theory; it may be many or it may be few. But, it would 
be misleading to suggest that ego-itivolvement in a theory is entirely 
beneficial just because it gets research done. This same ego-involve- 
ment may introduce blind spots for the significance of data not 
directly relevant to the theory. Important facts, acquired as by- 
products of theory testing, may be ignored. And, in view of what 
we know about the effect of motivation on behavior, it is possible 
that theory involvement could lead to a distortion in perceiving and 
reporting data. Therefore, we must not take for granted that theo- 
retically motivated research behavior can do naught but good. 

The second meaning to the statement that theory spawns research 
is that the theory itself suggests research which would not have been 
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ful to talk about theoretical readiness for psychology as a whole, for 
certainly it appears that our formalized theory will develop within 
areas. 

I have intimated that theoretical readiness depends upon the 
acquisition of a stable body of phenomena and relationships between 
and among independent v’ariabJes and these phenomena. In short, the 
theoretician must have something to theorize about; the theory must 
of necessity cope with a fundamental body of knowledge if it is to 
be useful at the integrative level. Such a body of knowledge tran- 
scends any particular theory and yet seems necessary before serious 
formalized theoretical undertakings- This has been true in the 
sciences which now have highly developed theories. Whittaker, the 
English physicist says; 

Let it be frankly admitted that a certain body of knowledge must have 
been created by the methods of experimental physics before theoret- 
ical physics can make a start; the formulae of reflection and refraction 
must be kno%vn before Huygens can devise his Principle to explain 
them; but when the conceptions of theoretical physics have been intro- 
duced, they have a vicalicy of thetr own, and an adaptability to fields 
other than those in connexion with which they were introduced; and 
ic is by them that the unlflcadon of isolated experimental results into 
comprehensive general theories is achieved p. 48). 

Now, let us turn to some remarks of psychologists concerning the 
theoretical readiness in their special areas. The field of learning in 
psychology has been one which, for some reason, has been most 
fertile not only for theoretical attempts but for discussion of the 
role of theories by men working in the area. Concerning theories in 
iearning, Skinner has most recentiy taken the position that 
they may not even be necessary. This fact was noted earlier, for 
Skinner feels that actual retardation can occur in the field of leam- 
ing by paying too much attention to theory. It may narrow or re- 
strict (although perhaps not reduce) research efforts and not allow 
for a more free play in systematic exploration of variables. But, even 
Skinner does not say theory of a kind will never play an important 
role in the development of the science of learning: 

This does not exclude the possibiKty of theory in another sense. 
Beyond the collection of uniform relationships lies the need for a 
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I do not pretend that these arc all of the roles which have been 
attributed to theory in psychology, but they do represent the major 
ones. It is clear that we have disagreements about some of the specific 
roles of theory in our science. I suspect we would have near com- 
plete agreement on the general statement that the aim of explanation 
(via theory) is to account for the greatest number of facts with the 
fewest assumptions. But, the issue still remains as to whether or not 
our science is in a state of “explanatory readiness” for the more 
formalized theoretical attempts. Let us turn to a sampling of current 
opinion on this matter. 

IS PSYCHOLOGY AT AN "EXPLANATORY READINESS” AGE? 

It is, of course, ridiculous to try to determine whether or not 
psychology had become of theoretical-readiness age on, say, June lo, 
*955, or that it will become of age on, say, April 30, 1970. As we 
shall see, theories in psychology could possess the formal structure 
required of them by the philosophers of science and yet vary greatly 
in their comprehensiveness, i.e., in the range of behavior phenomena 
which they encompass. It is not appropriate to talk about theoretical 
readiness for psychology as a whole because the areas within psy- 
chology differ markedly in their stages of empirical development. 
For example, in the study of sensory processes, particularly audition 
and vision, the empirical development is probably at a higher level 
than for any other area of psychology. In the realm of social 
processes, on the other hand, the accretion of a body of established 
phenomena and relationships is only in its initial stages. 

I think it can be taken for granted that our theories will con- 
tinue to develop within very limited areas rather than in terms of 
comprehensive theories of behavior. It may, however, be expected 
that there will be immigration of theories from one area to another 
within psychology. That is, if a highly developed theory is attained 
in one area it may be expected that it will have some usefulness in 
adjacent areas so that eventually all aspects of behavior may be in- 
corporated within a single system. But, let us not allow our wishful 
thinking to take us too many generations beyond the present, and 
perhaps beyond anything we can foresee with any assurance of its 
attainment. The point I wish to make is that it is not very meaning- 
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ful to talk about theoretical readiness /or psychology as a whole, for 
certainly it appears that our formalized theory will develop within 
areas. 

I have intimated that theoretical readiness depends upon the 
acquisition of a stable body of phenomena and relationships between 
and among independent variables and these phenomena. In short, the 
theoretician must have something to theorize about; the theory must 
of necessity cope with a fundamental body of knowledge if it is to 
be useful at the integrative level. Such a body of knowledge tran- 
scends any particular theory and yet seems necessary before serious 
formalized theoretical undertakings. This has been true in the 
sciences which now have highly developed theories. Whittaker, the 
English physicist says: 

Let it be frankly admitted that a certain body of knowledge must have 
been created by the methods of experimental physics before theoret- 
ical physics can make a start; the formulae of reflection and refraction 
must be known before Huygens can devise his Principle to explain 
them; but when the conceptions of theoretical physics have been intro- 
duced, they have a vitality of their own, and an adaptability to fields 
other than those in connexion with which they were introduced; and 
it is by them that the unification of isolated experimental results into 
comprehensive general theories is achieved (aj, p. 48). 

Now, let us turn to some remarks of psychologists concerning the 
theoretical readiness in their special areas. The field of learning in 
psychology has been one which, for some reason, has been most 
fertile not only for theoretical attempts but for discussion of the 
role of theories by men working in the area. Concerning theories in 
learning, Skinner (/y) has most recently taken the positTon that 
they may not even be necessaiy. This fact was noted earlier, for 
Skinner feels that actual retardation can occur in the field of leam- 
ing by paying too much attention to theory. It may narrow or re- 
strict (although perhaps not reduce) research efforts and not allow 
for a more free play in systematic exploration of variables. But, even 
Skinner does not say theory of a kind will never play an important 
role in the development of the sdence of learning; 

This does not exclude the possibility of theory in another sense. 
Beyond the collection of uniform relationships lies the need for a 
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fol to talk about theoretical readiness for psychology as a whole, for 
certainly it appears that oar formalized theory will develop within 
areas. 

I have intiimted that theoretical readiness depends upon the 
acquisition of a stable body of phenomena and relationships beween 
and among independent variables and these phenomena. In short, the 
theoretician must have something to theorize about; the theory must 
of necessity’ cope with a fundamental body of knowledge if it is to 
be useful at the integrative leveL Such a body of knowledge tran- 
scends any particular theory and yet seems necesary before serious 
formalized theoretical undertakings. This has been true in the 
sciences which now have highly developed theories. Whittaker, the 
English physicist says: 

Let h be frankly admitted that a certain body of knowledge must have 
been created by the methods of experimental physics before theoret- 
ical ph)’sics can make a sart; dbe formulae of reflection and refractira 
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them; but when the conceptions of theoretical physics hare been intro- 
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I do not pretend that these are all of the roles which have been 
attributed to theory in psychology, but they do represent the major 
ones. It is clear that we have disagreements about some of the specific 
roles of theory in our science. 1 suspect we would have near com- 
plete agreement on the general statement that the aim of explanation 
(via theory) is to account for the greatest number of facts with the 
fewest assumptions. But, the issue still remains as to whether or not 
our science is in a state of “explanatory readiness” for the more 
formalized theoretical attempts. Let us turn to a sampling of current 
opinion on this matter. 

IS PSYCHOLOGY AT AN “EXPLANATORY READINESS" AGE? 

It is, of course, ridiculous to try to determine whether or not 
psychology had become of theoretical-readiness age on, say, June lo, 
19551 or that it will become of age on, say, April 30, 1970. As we 
shall see, theories in psychology could possess the formal structure 
required of them by the philosophers of science and yet vary greatly 
in their comprehensiveness, i.e., in the range of behavior phenomena 
which they encompass. It is not appropriate to talk about theoretical 
readiness for psychology as a whole because the areas within psy- 
chology differ markedly in their stages of empirical development. 
For example, in the study of sensory processes, particularly audition 
and vision, the empirical development is probably at a higher level 
than for any other area of psychology. In the realm of social 
processes, on the other hand, the accretion of a body of established 
phenomena and relationships is only in its initial stages. 

I think it can be taken for granted that our theories will con- 
tinue to develop within very limited areas rather than in terms of 
comprehensive theories of behavior. It may, however, be expected 
that there will be immigration of theories from one area to another 
within psychology. That is, if a highly developed theory is attained 
in one area it may be expected that it will have some usefulness in 
adjacent areas so that eventually all aspects of behavior may be in- 
corporated within a single system. But, let us not allow our wishful 
thinking to take us too many generations beyond the present, and 
perhaps beyond anything we can foresee with any assurance of its 
attainment. The point I wish to make is that it is not very meaning- 
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questions we can be open-minded and are likely to let the answer to 
one question influence the nature of the next. It is this way that we 
get acquainted with our universe. However, when we predict we show 
our maturity, and we can even determine a scientist’s success by cal- 
culating his percentage of correct predictions. But we must not act 
more mature than we really are (i#, p. 54). 

Thus Maier’s position is csscnt/aHy “go easy.” Seven psychologists 
who analyzed five major theoretical attempts in the field of learning 
have this to say: 

On the one hand, we appreciate the need for a common theoretical 
structure to facilitate the ordenng and application of our knowledge 
of learning. On the other, we recognize the complexity of the ma- 
terial which must be handled by a behavior theory. Vigorous individual 
attempts at theory construction along a wide variety of fronts are 
probably not only desirable but necessary for continued progress in 
this area (/o, p. xiii). 

MacKinnon, whose work has been largely in the area of person- 
ality, says: 

...I shall now express my opinion that in personality research our 
theorizing and building of models have outrun activities more inti- 
mately concerned with observation and data collecting. Our greatest 
need for the more adequate study of personality is systematic observa- 
tion and systematization of the data we collect, and this, I submit, 
is something more than theorizing (ij, p. 141). 

Koch, writing in 1950, states his belief as follows; 

We must start with the humiliating acknowledgment that psychology 
is in a pre-tbeoretical stage, and that the central problem of the funda- 
mental psychologist is not what doctrine to embrace or concoct, but 
shnply to assay, realistically, hovj psychology can be made to move 
towards adequate theory (//, p. 298). 

Finally, let me cite George, an English psychologist. In proposing 
a reduction of linguistics of p^chology to logical forms, and in 
speaking of theory in general, he somewhat petulantly condemns the 
strict empiricist as follotvs: 

This approach, it is hoped, will especially be brought to the notice of 
those narrow experimentalists who repeatedly call for cxperimenc, 
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formal representation of the data reduced to a minimal number of 
terms. A theoretical construction may yield greater generality than 
any assemblage of facts. But such construction will not refer to another 
dimensional system and will not, therefore, fall within our present 
definition (ip, p. 2x5-216). 

We do not seem to be ready for theory in this sense (ip, p. 216). 

Spence (21) makes it clear that the first objective of the scientist 
is to establish a set of laws relating the independent and dependent 
variables. Only when some precision is obtained for such relation- 
ships is the scientist ready to develop a formal theoretical system, 
albeit a system that explains a very narrow range of behavior. The 
fact that Spence has done considerable theorizing of a type in certain 
areas of learning suggests that he must believe that the state of 
empirical knowledge warrants such theorizing. One cannot be sure 
of this, however, for again we must keep distinct the formal theo- 
retical systems and other explanatory attempts; Spence's theorizing 
does not easily fit the formal systems idea. Hull (e.g., 13) has been 
the most ardent advocate of the formal theoretical systems in the 
field of learning and more than any other psychologist has accom- 
panied his ardor with the construction of the systems. How “good” 
or “useful” these systems are is another matter. 

Maier takes an in-between position with regard to the readiness 
age for theorizing in psychology of learning in an article called: 
“Premature Crystallization of Learning Theory.” 

1 personally feel that an interest in theories is desirable for the develop- 
ment of science because theories help us organize facts and they help 
us to aslc good research questions. However, an interest in theories can 
become a liability if it prevents us from exploring certain kinds of 
relationships or causes us to ignore facts if they do not fit the theory 
with which we identify ourselves. When these things occur, the theory 
becomes an attitude and ideas become good or bad rather than right 
or wrong. 

Perhaps we are somewhat overambitious and have assumed that ps)'- 
chology is more advanced than facts warrant. We seem to want a 
learning theory that works not only for all learning situations but also 
for all behavior. We seem to want to predict, to do research by stating 
hypotheses, and seem no longer to be content with asking questions of 
the universe and getting our answers through research. When we ask 
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present; others have both. Some can carry their theories lightly 
9)’ others never shed them. For those who have little interest 
in theory construction but do have interest in research, it is evident 
that the present state of theory in psychology, and thinking about 
theory, puts little restriction on their efforts. That is, there is a wide 
range of problems in psychology which need systematic explora- 
tion of the sort that derive direct relationships between dependent 
and independent variables. Such lawful relationships stand as a basic 
contribution to our science. If one has no interest in theory he need 
only fit his research into the empirical framework of the area as it 
stands at the moment. Others who have interest in theory will sooner 
or later fit his findings into a theoretical system. Let us recognize the 
fact of individual differences in abilities and interest and not expect 
every scientist to be able to do all the things which science in its 
totality is. Important discoveries have been made and will continue 
to be made by asking simple questions about the funcdoning of 
nature. Likewise, important contributions have been made and will 
continue to be made by theoreticians as they organize apparently 
diverse facts. The history of science gives no basis for the disparage- 
ment of either the systematic empiricist or the theoretician at any 
stage in the development of a science. 

PLAN FOR THE FOLLOWING CHAPTERS 

Having attempted to suggest some of the problems we face, some 
of the disagreements that dominate the over-all picture of explana- 
tory " attempts in psychology, I shall now indicate the approach 
which will be taken in the following chapters. I said earlier that it k 
rather futile to analyze explanatory attempts in psychology by com- 
paring them with what I have called the ideal formalized structure 
recommended by philosophers of science as a result of their analyses 
of highly developed physical theories. We are interested primarily 
in how theories get started and how they grow. Looking at the 
formalized system, the fait accoTnpUj might help us in setting our 
sights for the distant future but it docs not at all refiecc the agoniz~ 
ing work of the many many scientists which went into the making 
of the system; it does not represent the false starts and the inelegance 
of theorizing which crop up almost cvcrj^vherc in the efforts of the 
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and decry theory, and continue to stumble through the maze of science 
erroneously believing themselves to be dealing wholly with facts, and 
never with linguistics (ii, p. 232). 

Now, let me quickly add, that the issue of whether we should or 
should not have theory, or the issue of whether if we are to have it, 
just how soon we are going to have it, is not going to be settled by 
the kind of opinion poll, of which I have given an inadequate sample 
above. But even this poll reflects the diversity of opinion and that is 
its purpose. I think it is well to remind ourselves continually that 
at our present stage of development in psychology, there are very 
diverse opinions on these matters. T hold the opinion, developed 
from talking with psychologists at a number of universities, that 
many Ph.D, candidates have been led to believe that unless a piece 
of research is predicated on at least one hypothesis, developed some- 
what formally from some assumptions, it can have little value. Such 
beliefs are provincial in the sense that they do not reflect the think- 
ing of a number of respected scientists in psychology today who feel 
that we should not allow ourselves to get immersed in theory at the 
present stage. 

But still, to repeat, no opinion poll will settle this issue; the best 
that could be hoped for is that from it some tolerance would result. 
Whether or not we are ready for formalized theory in certain areas 
of psychology can only be judged by the success such theories may 
have in working toward an organization of diverse facts under a few 
basic principles. The experts in each area will have to be the judges 
of this matter. For some of the more general theories in the field of 
learning the current evaluation is not too encouraging, although the 
same writers \vho have heavily criticized the theoretical attempts 
call for more— but better— attempts (/o). 

THE INDIVIDUAL SaENTIST AND THEORY 

One of the basic observations of human behavior is that individuals 
differ. These individual differences exist among psychologists. Some 
psychologists develop no interest in theory; others do. Some do not 
have the motivation or ability to cope with theoretical matters; 
others do. Some have not had the training even if the motivation is 
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ignored or the meaning and intent of the concept will be too greatly 
masked. 
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scientist. The bulk of the explanatory attempts in psychology today, 
the explanatory attempts of the typical psychologist interested in 
the matter, deals with only two or three explanatory concepts and a 
limited area of behavior. It thus seems to me that to reflect most 
faithfully the explanatory attempts in psychology, we must deal 
extensively with single concepts and relationships among a few 
concepts. And our prime purpose is to comprehend, not exhort. 

So our initial and major effort in the material to follow will be 
directed toward single concepts. That is, taking a single concept, we 
must discuss its characteristics, what it is intended to do, how it 
differs from other concepts, opinions concerning how it should be 
used, and how it is related to other concepts which may be said to 
constitute an explanatory system. The task is not a simple one, for 
(^) a theory-like concept may be used in different ways by different 
writers; {b) the way in which a particular concept is being used by 
a writer is not always clear, and (t;) the ways in which concepts 
differ are numerous. This presents, then, a somewhat unpalatable 
task, for surely injustice will be done some writers and some con- 
cepts. And, although I shall deal only with some of the differences 
among concepts, and shall try to make the differences clear enough 
by example and pointedness, it goes without saying that sharp lines 
of distinction will be an exception. 

The approach will be quite simple. I try to take the viewpoint of 
the individual research worker in the laboratory and look at the 
problems of explanation which confront him as he tries to “make 
sense out of the data in his own limited area of research. We then 
try to analyze what he comes up with and occasionally speculate 
on his thoughts as he tries to move from data toward explanation. 

The following chapter deals exclusively with analyses of indi- 
vidual concepts. In this chapter 1 must again beg indulgence for 
some repetition of materials occurring in Chapter 3 (Operational 
Definitions). It was not my intention there to fully set forth 
the differences which actually prevail in the usage of operationally 
defined concepts. I shall try to correct these omissions in the follow- 
ing chapter. When dealing with the status and characteristics of indi- 
vidual concepts I shall try to avoid as much as possible the interrela- 
tionships among concepts although in some cases this must not be 



Some Characteristics of Concepts 


Because concepts with widely dif- 
ferent characteristics eventually enter into explanatory attempts, I 
must emphasize these differences in concepts as concepts before X 
consider how they may be used in explanations. This objective is 
the sole intent of this chapter. 

I do not feel that there is a good, single descriptive dimension 
along which I can order the concept analysis which is to be made. 
Differences among concepts are multidimensional. In a very loose 
sense I shall proceed along a dimension of abstraction, by which I 
mean one based upon how far the concept h removed from imme- 
diate data. There should be some way, as a sheer matter of con- 
venience, for designating modal points along this rough dimension. 
Being blessed with extraordinary inventiveness I have chosen to 
speak of five points called Level i, Level 2, Level 3, Level 4, and 
Level 5. 

LEVEUi CONCEPTS 

As I indicated near the close of Chapter 3, I did not reflect fully 
the differences in attitude toward the question about what oper- 
ational definitions define. This was intentional, in part to keep the 
discussion tidy and in part because it is my personal bias that oper- 
ational definitions should be largely concerned with defining be- 
havioral phenomena. In the present discussion I will try to correct 
this one-sidedness arising from my intolerance. 

In Chapter 3 , 1 insisted that operational definitions arc concerned 
with behavioral phenomena; wc define these phenomena by specify- 
ing the measuring operations required. However, cenain writers say 
that opcrarional definitions are given to independent variables. In my 
discussion, the definition of independent variables was a part of the 
operational definition of a phenomenon. I limited operational defini- 
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second) are operationally defined by the physicist. And, if one 
wishes to trace the history of such definitions it can be seen that all 
are based on the discriminatory response of the human observer— 
the scientist (/y). But, let us pursue these matters no further. There 
are two points which, by way of summary, I wish to make. First, 
it is necessary to specify what one means by his independent vari- 
able, and such specifications are sometimes called operational defi- 
nitions. Second, certain words which may be used to summarize such 
specifications by some are used in quite different ways by others. 
The Level-j concept as used here refers ro the specification of the 
independent variable without immediate reference to the behavior 
of the subject. 


LEVEL-j CONCEPTS 

We have seen that explanatory attempts are undertaken when 
there is some body of reliable knowledge. For psychology I have 
spoken of this body of knowledge simply as reliable phenomena, 
including thereunder refiable relationships between dependent and 
independent variables of all degrees of precision as well as relation- 
ships among dependent variables. TTicsc constitute the data which we 
try to explain. Usually such data are gotten by research, although 
this is not necessary as long as we can be sure of the reliability^ of 
the phenomena. Thus, it wouldn’t require a Latin-square design to 
determine that apples consistently fall to the ground rather than 
into the sky. In psychology most behavior in which we are inter- 
ested is somewhat more subtle than a falling apple and we are com- 
monly faced with the explanation of research-derived relationships. 
The notion for an explanation may come from casual observation 
and it may be supported ostensibly by drawing attention to inci- 
dental observations, but, for the most part critical explanatory at- 
tempts start and end with data derived from research. 

These considerations indicate that to fully understand the use of 
concepts in psychology we must return to our discussion of the 
characteristics of concepts employed to summarize operations used 
in defining phenomena. "We might think that we could make short 
work of this and immediately proceed to concepts which hav’c 
explanatory-like status. This is not so, for again, I must apologize 



196 Vsycholopcal Research 

tions to an if-if-then type of statement. Nevertheless, because certain 
writers speak of independent variables as being defined operationally, 
I am recognizing their position here and am calling these variables 
Level-i concepts. It is especially important that this be done at 
this point for sometimes words which are used to summarize oper- 
ations by some writers are used in quite another way by others. Let 
us look at a few illustrations. 

The term extinction is sometimes used in conditioning to indicate 
simply the removal of the unconditioned stimulus. That is, it refers 
only to an activity of the experimenter, not the activity of the ex- 
perimenter and the resulting behavior of the subject. It thus specifies 
a change in the experimental conditions and that is all it specifies. 
The word reinforcement is sometimes used to indicate that the ex- 
perimenter gave the animal food after a correct response; by others 
it is used to include both the giving of the food and the resulting 
change in behavior. 

The experimenter may say that he is operationally defining 
deprivation time (an independent variable) by number of hours 
since feeding. Cycles per second is defined as number of undulations 
of the sign wave in one second. Cortical lesion may be specified 
minutely in terms of surgical-techniques used, exact site of lesion, 
and so on. 

Level-1 concepts, therefore, refer to activities of the experimenter 
in specifying what he means by a particular term used as a name for 
an independent variable; such definitions do not include behavioral 
aspects of the subject; they do not define behavioral phenomena in 
the sense that I have used the term in discussing operational defini- 
tions. Of course, when such terms are used to indicate what is meant 
by the independent variable, their meaning must be made perfectly 
clear. I simply have not included such specifications as operational 
definitions independent of the behavior of the subject. This is quite 
an arbitrary decision on my part. I may note, however, that many 
independent variables become such only after being operationally 
defined behaviorally. For example, the class of operations which I 
called Scaling Operations in Chapter 3 may result in a reliable dimen- 
sion which is subsequently manipulated in an experiment as a part 
of the definition of another behavioral phenomena. The physical 
scales which are used in psychological investigations (e.g., cycles per 
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cepcs do not relate to any rheory-Iike implications since not one 
whit of explanatory prejudice resides in the definition. 

I should note again, as in the chapter on operational definitions, 
that all operationally established phenomena in psychology do not 
have a name. For example, the fact is that differences in perform- 
ance that result from massed and distributed practice do not have a 
name, but as a phenomenon would usually be called a Level-2 con- 
cept. In deference to brevity, my illustrations will make use of 
phenomena to which names are usually assigned. 

2. Fignral after-effect. I suspect any phenomenon named with the 
terminal word “effect" will be a Level-2 concept. The implication is 
clearly that of being a name for a behavioral phenomenon without 
any extra-operational cause implied. In the case of figural after-effect 
the operations specify that if a particular fi^re is fixated for a short 
period of time, and if then a test figure (with certain specifications) 
is observed, certain distortions will be reported, the distortions being 
delineated by comparison with a control in which the original figure 
was not fixated. The critical feature of the operations is the fixation 
of the original figure and again, this may be thought of as the “use 
for the subsequently measured distortion. But, nothing in the defi- 
nition implies or even intimates a cause over and above this fixaciom 
^ 2. Experimental extinction. In illustrating Level-i concepts I said 

that the present term is used in a Level-x sense sometimes. However, 

I think that the majority of writers think of this as a behavioral 
phenomenon of the subject. Furthermore, it is defined as a function 
of the critical variable (removal of the unconditioned stimulus) and 
no other causal mechanism is implied; the change in behavior result- 
ing from these particular operations is experimental extinction. 
Again, it is quite another matter to suggest some 
anlm resultlg from the removal of the_ unconditioned stimulus 
which is said to account for or explain extinction. --ocmirp 

I need not extend the illustrations; it « enough for ^ 
that Level-2 concepts are as empinca as it is ^ 

behavioral phenomenon via operational JnS’ 

represent phenomena which have a strong c ohenom- 

c/-.poi«-a%.eaess/'Whneweof«™^^ 

Sn"at°LeveU SgLTeem to suggest that the phenomenon identified 
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for not fully representing in Chapter 3, the differences in attitude 
■which exist toward operationally defined concepts. As a matter of 
fact it is this difference in attitude toward operationally defined 
concepts which, in my opinion, has created a most perplexing con- 
fusion (if confusion can be anything but perplexing). This con- 
fusion will come out most clearly when I subsequently contrast 
Level-2 concepts with Level-3 concepts. So first, I must indicate 
what I mean by Level-2 concepts. 

A Level-2 concept is one which summarizes the operations used 
to define a phenomenon and therefore merely identifies the phenom- 
A c^' identification or phenomenon naming. 

The definition of the phenomenon im'plies not one thing about a 
causal process or condition over and above the operations per se. 
My presentation of operational definitions in Chapter 3 was entirely 
o t IS nature. That this presentation was by no means reflective of 
current practice in psychology will become evident (and I repeat 
myself) m discussing Level.3 concepts. For now, however, let me 

ff ChaprerT' 


I. Reminiscence in motor learning. To define adequately this 

dcfinmon. To 

the definitioa m general terms we would say that if two groups 
s on a motor task, and if after a specified number of 
tnals the first group is given a short rest and the second not, and if 
P^*" of the first group is superior to the second after 

rwn' defined. Difference in the performance of the 

^o groups after the rest of the first group it reminiscence. The 
phenomenon has, m such definitions, almost a “point-at-able” status. 
Nothing IS .mphed m the definition about any causal factor other 
t an the rest pause. The rest pause is the critical independent vari- 
able m the defimng operations. Now. of course, that an investigator 
may give such a definition doesn’t mean that he doesn’t have ideas 
or notions about processes taking place during the rest pause which 
inay have produced the observed difference. He simply is not letting 
these ideas enter into his definition. At the strictly empirical level 
the rest pause is the cause of reminiscence but explanatory attempts 
may be expected to go beyond such a notation. These Level-t con- 
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concepts, but first we need some illustrations of concepts which are 
commonly given in Level--3 language. Again, for the sake of brevity, 
I shall not give complete definitions, for the concepts are familiar 
and we need not worry about the formalities in this particular 
instance. Some of my illustrations were used in Chapter 3, but there 
I used Level~2 language since 1 was at that time trying to avoid the 
present conflict. 

1. Drive. I suspect we have no purer case of a LcveI-3 language 
than that commonly used to define drive in animals. Drive may be 
defined by relating difi^erences in deprivation time (such as for food 
or water) to performance differences (such as activity level). 
Having done this it would be quire unorthodox to “point at” the 
performance difference and say “that is drive” (as we would do at 
Lcvel-2). Rather, we almost universally think of drive as something 
which causes the performance difference; wc say that differences 
in drive caused the performance difference. This “something” may 
be thought of as a purely abstract something or it might be thought 
of as changes actually taking place in the organism, without specify- 
ing in the definition what these are. (The matter of locus or reality 
of such concepts will be discussed later). Nevertheless, our defini- 
tion implies something which is changed by changes in deprivation 
time and this in turn causes the performance difference which we 
have observed. 

2. Frustration. In defining frustration our attitude toward what 
we are dealing with usually puts a process or state (called frustra- 
tion) “inside” the organism to account for the difference in per- 
formance. 

3. Repression. If this concept were operationally defined success- 
fully, i.e,, if reliable performance differences could be obtained as a 
consequence of unique operations, most psychologists would, I feel 
confident, think about it as something which caused the perform- 
ance difference. 

These three illustrations indicate phenomena which are defined 
by E/C, S-R type operations. I shall shortly turn to concepts 
defined by response identification. It will be remembered that cer- 
tain phenomena are defined by S-R identification with a physical 
stimulus scale and some with a p^chological scale. The operations 
are somewhat different from those used in E/C, S-R definitions. I 
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exists without reference to the organism (the subject). The Level-2 
concepts, via their definitions, suggest that we are dealing with cold, 
hard, empirical findings; the subject is largely ignored. This is in 
contrast to the Level-3 operational definitions to which I now turn. 

LEVEL-j CONCEPTS 

Like Level-2 concepts, LeveI-3 concepts are operational defini- 
tions of phenomena; the only diflFerence comes in the wording of 
the definition, usually in the final phrase. This difference in wording, 
I strongly suspect, comes about largely because of differences in 
the conceptual predilections of the scientist regarding certain phe- 
nomena. Level-3 concepts name or identify a phenomenon just as 
o Level-2 concepts; but, the name is applied to a hypothetical 
process, state, or capacity as a cause for the observations indicating 
the phenomenon. Thus, if we use the E/C, S-R operational differ- 
ence, the definition details the operations and then, in one way or 
another, says: “this difference between the two conditions leads me 
capacity) causing the difference, and I shall 
call this Process X.” LeveI-3 concepts may be thought of as causal 
naming or causal identification. 

Let me contrast this type of conceptual thinking with that lead- 
ing to Level-2 concepts. For illustration I will use reminiscence in 
motor learning. The definition at Level-2 stated that the difference 
in per ormance is reminiscence. If the definition were recast into 
Levd-3 language, we would say, after indicating the operations: “If 
a difference obtains between performance of the two groups after 
rest, I will infer a process which caused the performance difference 
and I will call this process reminiscence.” I do not find that such a 
definition is customarily given for reminiscence hut such could 
be given and when so given it has the basic characteristic I am 
using to identify Level 3. 

The implication of the considerations thus far is that both Levcl-a 
and Lcvel-3 concepts may be, and often are, based on the same 
formal type of operations; the differences come about because of the 
way the scientist thinks about what he is dealing with, and this differ- 
ence in the scientist’s thought processes is reflected in’the definition. I 
shall return shortly to some further characterizations of LeveI-3 
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locus or position within the organism. Nevertheless, many scientists 
using Level-3 concepts often find it very difficult not to think in 
terms of a “real” process as opposed to an abstraction existing no- 
where except in the scientist’s fantasies. (See Kneale, y, p. 354 and 
Beck, .2, p. 370 for discussion of this problem in the physical 
sciences.) It should be clear, therefore, that the vertical bars in the 
diagram to follow do not represent the soma of an organism unless 
the reader puts it there. The space between the bars need represent 
nothing more than the infinite domain of scientific abstraction. 

In defining concepts operationally by S-R type definitions we 
have specified stimulus manipulations (Sm) and specified response 
differences (Rd). Level-a concepts avoid the idea of a state or proc- 
ess and so the concept is defined by referring directly to the relation- 
ship between Rd and Sm. Level-3 concepts name a process or state 
as causing Rd and this process or state is related directly to im. 
Diagrammatically, these may be depicted as in the accompanying 
figure. 


I 


Level 2 


I 


Sm- 


-Rd 



Rd 


a state or process only if n rehab process 

and then says that this difference is cau y sounds to 

(X). DifferLes in X are in turn caused >>y S" ’ » 

you like scientific do not always make 

It should be mentioned that Lcve 3 made it here, but it 

circularity of the inference so 

is inevitably present, obtained I shall infer a 
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see no reason why these other two types of S-R identification pro- 
cedures could not be used for defining Level-3 concepts. But, 
usually this is not the case. Rather, these concepts are handled by 
what I have called the Level-2 definitions. Thus, pitch, brightness 
contrast, flicker fusion, and so on, appearing most commonly in 
sensory-perception studies, are usually defined at Level 2. Occasion- 
ally, having defined the term at Level 2, the writer may slip and talk 
as if it (the defined phenomenon) is now causing itself. 

There are phenomena which are by no means consistently placed 
at Level 2 or Level 3. For example, it seems to me that the phenom- 
enon of generalization is sometimes placed at Level 2 and sometimes 
at Level 3. Also, transfer is ambiguously placed. Sometimes the 
phrase transfer effect is used, which clearly puts it at Level 2. But 
at other times the word transfer implies a state of interaction which 
causes a given performance difference and thus falls nearer to Level 
3. The term closure, is sometimes used to indicate a process or cause 
of a phenomenon and sometimes to indicate the phenomenon itself. 
However, such ambiguities are not my primary concern at this 
point. The major point I wish to make is that there are concepts, 
based on the same formal type of operations, which are “thought 
about” differently by psychologists. The distinction between Level 2 
and Level 3 is intended to reflect this difference in scientists’ thought 
processes. 


FURTHER DIFFERENCES AND SIMILARITIES AMONG 
LEVEL-2 and level-3 concepts 

It may be useful to diagram the differences which are involved in 
these two levels. But, at the same time it should be recognized that 
there is a certain danger involved in static diagrams. Either at Level 2 
or Level 3 the relationships among the various terms in the defini- 
tions (that is, the stimulus manipulations, the response difference, 
and, for Level 3, the hypothetical process or state which produced 
the difference) need not imply anything about the organism. The 
hypothetical process in LeveI-3 concepts need not imply any- 
thing except a name for an assumed causal process. This causal proc- 
ess is inferred from the empirical relationship, but it need not have a 
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this capacity among individuals which caused the difference. To 
make the diagram identical to that used for S-R Level-3 concepts an 
arrow would be added between Sr and X. However, m this case the 
meaning of such a connection would he quite obscure. Would we 
say that Sr caused the differences 


in X? No, I think this would be 
revolting to most psychologists; 
Sr only enabled us to establish 
that individuals differ in amount 
of X. Did the “quantity” of X 
for a given person result from 
Sr? I think most would say “no” 
again since this is really the same 


Ss- 


© 


> - R/d 


I 


again smce this is really tne same 

question. A given amount of capacity or state X akeady exisK, 

Sr was merely a vehicle which aUowed 

Like the Skinner-boit, a pursuit rotor, or a SneUen 

and-pencU test (such as an inteUigence test) is a 

beha^r. From differences in this behavior eheited by *e^ 

differences in capacities of subjects «e defined. Of 

related antecedents (such as heredity m the case of mtelhgence) 

but they do not enter into the tivo types 

Elaboration of Levd-2 and Level-} concepts. Th W 

of elaboration of Level-r and Level-3 concept 
cuss. One may be thought of as operattonal ^YyTnde- 

sthmdus variable elaboration. While these 

pendent, it wiU " ““m 'oplS m^ he 

In Chapter 3, I noted that several 

classed together if 'hay Xektag of goal-dkeeted behavior; 

tion is defined as resulting from bl g between 

retroactive ■^’'geneS eould be defined as 

onginal learning and recall D 8 

resulting from deprivation of bio g . must be 

and so on). All that a phenomenon to be 

a reliable performance diffcrcnc ^ ' ,i;ffercnccs in the specific 

defined. 4w, it ^ ^he ap^that 

o^rr "■ 
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end of the definition which would say: “X caused the difference.” 
But, this causa! inference is implicit for if no reliable response differ- 
ence were found X would not have been inferred; X must be, in 
a manner of speaking, responsible for the performance difference. In 
short, it would appear that Lcvel-3 concepts have a certain amount 
of risk about them which is not present on Level 2. But, let us 
delay further evaluation until we have had an opportunity to see 
how such concepts may be elaborated. I wish to turn now to the 
status of simple response-defined concepts. 

Simple response-defined concepts. In Chapter 3, it was pointed out 
that this class of operationally-defined concepts is well typified by 
concepts resulting from the use of paper-and-pencil tests. Take the 
case of intelligence. If intelligence is defined at Level 2 we indicate 
that there must be reliable individual differences on a specified test. 
A given individual s intelligence is defined as the score made on the 
test specified (related, of course, in some fashion to scores made by 
other individuals). Very few psychologists use such definitions; 
rather, the response measure (score on test) is used to infer a hypo- 
thetical state or capacity which is called intelUgence. IntelUgence is 
responsible for the score on the test. And, I think that most psychol- 
ogists are almost compelled to think of this as some capacity which 
really exists in the organism, although 1 repeat that such thinking 
IS not demanded by the operations. So, then, I am asserting that 
mteUigence is most commonly developed as a Lcvel-3 concept. I 
wou urther assert that many other subject capacities are concep- 
tua ze in this fashion, e.g., mechanical aptitude, introversion, 
anxiety. 

Let me first remind you that concepts such as intelligence are 
defined through the use of a static stimulus situation. That is, all 
su jeers are treated in the same fashion— there is no active stimulus 
manipulation as in the case of S-R defined concepts. To define the 
concept all that is needed ate reliable individual differentes in re- 
sponse to this static stimulus situation. I will call the static stimuU Sr, 
and the individual differences RW. In most simple form, the depic- 
tion would be as shown in the accompanying diagram. What this 
diagram means is that if Sr produce reliable individual differences, 
an inference is made that there exists differences in the “amount” of 
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for the specific operations fitting under the general operations, and 
we can picture the t\vo levels as in the accompanying diagram. 



Operational identification also taltes place in response-identified 
concepts. Actually, this identification is what I have called complex 
response identification in Chapter 3, and is typified by factor ana- 
lytic procedures. Factor-analytic attempts may be, and usually are, 
preceded by logical-rational considerations or the formulation of 
hypotheses about the composition of subject capacities. I think it 
will be well to quote Thurstone on this matter; 


In the light of a good deal of experience with the factorial methods, 
we should be able to give students a few practical suggestions. In the 
Ps>’chometric Laboratory at Chicago, we spend more time in design- 
ing the experimental tests for a factor study than on all of 
putational work, including the correlations, the factoring, and the 
analysis of the structure. If we have several hypotheses about postulated 
factors, we design and invent new tests which may be crucially difter- 
enriating between the several hypotheses. This is enorely a psycho- 
logical job with no computing. It calls for as muc ps^ o ogica 
insight as we can gather among students and instructors. Frequently 
we find that we have guessed wrong, but oecasiunaUy the results are 
strikingly encouraging. I mention this aspect of factona wor ' in 
hope of rounteracring the rather general impression that fanor anaijs 
is ail concerned with algebra and starodes. ThBe should be - 

ants in the invesdgarion of psjxhologicai ideas, u e 

logical ideas, we are not likely to discover anjnhmg mteresnog, berouse 
evxn if the factorial results ate clear and clean the .ntctptctaoon must 
be as subjective as in any other scientific work (/ , p. 
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elaboration of the specific operations which fit under the general 
operational definition of a phenomenon, i.e., the identification of all 
of the operations producing the phenomenon. Stimulus variable 
elaboration refers to a working out of how different specific oper- 
ations produce differences in amount of a phenomenon. This, in 
essence, refers to the search for variables influencing the phe- 
nomenon. ° ’■ 


Operational identification normally takes two steps. First, there is 
a logical-rational consideration by the scientist (maybe the word 
guess IS more appropriate) that a particular set of operations, fall- 
ing under the general definition of a phenomenon, will produce the 
phenomenon. Thus, an investigator at some time guessed that falsify- 
ing scores (in a downward direction) would produce blocking in 
a strongly motivated subject and cause a performance differSice 
‘ (nonblocked) 

infj a <J‘ff«tence did indeed occur, frustration would be 

m!“,u ■ P' demonstrate experi- 

occiirr a ^ ' erence between control and experimental subjects 
em 7?7 7 of difference in treatment. Another 

wmda? "Yi 'Attaint of a small child 

i 1 1.° “'o Of ">ore operations, 

d ffering in detail but fitting the general definition of a pheLmenon, 

°PO‘''"‘onal identification has occurred. Dif- 
the aefi^'^a opff^dons may produce differences in amount of 
o7.ufa"aV “ "PP’’™' 'hat the number of 

much V • spaoific operations will vary depending on how 

encf'Cth™ “ procedure is to be called a differ- 

nncrntln ^ 1 separately. This is entirely an arbitrary matter. 

S^thT therefore, consists eLntially in observing 

lokV^r T “"‘'f “ phenomenon are such as to pro- 

1 r' 1 already established phenomenon. The ana- 

y cal nature of operational identification and its corresponding 
contnbuuon to the science is so great, in my opinion, that I shall 
remm to an extended discussion of it in the next chapter. 

ow again, oweyer, for S-R definition of phenomena, concep- 
tualization of operational identification may take Level-a or Level-3 
form. Let SiK stand for the general operations and Sm with subscript 
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involved. Other variables will probably influence the extent of the 
phenomenon and whether or not this is true can be shown by 
manipulating them in at least two amounts. When such variables are 
explored at several points we obtain systematic laws of behavior. 
We not only know that the variable is relevant (it will influence 
the amount of the phenomenon) but we also know the precise 
relationship between the two, the degree of precision depending on 
the number of points explored and the precision or rehabibty of our 

response measure. . r • t 

Again, the particular schematic conceptualization of stimulus- 
variable elaboration will depend on one’s bias for Level-t or Level-J 
thinking. The essential idea is that we “put in” the sumulus 
lations defining the phenomenon and then note how other stimulu 
manipulations (not used in defining) influence the magnitude of 
the phenomenon. Let us here set off the critical stimulus manipula- 
tion as Sjk, and call other stimulus variables S,, Sj, etc. 


Sm ^ 
S, 
s. 


Level 2 



Rd 


The Level-a scheme is perfectly straightfonvard. The phenom- 
enlnls s"d1o be such and'Lch a function of = 

“fotdt^th"^ 

different age show a Furthcriorl. if the relationship 

was caused by a difference in dnv . . , . say that 

between age and behavior is, »y. ^ logarithmic. In other 

the relationship benveen age an 
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Thus, a part of the factor analytic procedure is to make certain 
guesses about the structures of subject capacities and how these 
could be “brought out” by certain tests. This activity is indulged 
in by all scientists; probably no one undertakes an investigation 
without some thought as to “what will happen.” Such thoughts may 
not be verbalized but that they are almost universally there no one 
can doubt. 

Turning back to factor analysis, let us diagram the situation for 
these Level-3 '^yp*^ concepts. I shall again speak of static stimuli (Sr) 

and picture the situa- 
^ X ) factor 

*1^1 analysis has been com- 
' 1^2 pleted. To simplify the 

^3 diagram, I will say that 
only nvo factors have 
— I been determined (X 
1^5 and Y) and ignore vari- 
ance unaccounted for. 
As a result of the cor- 
— I relation among scores 
j . on tests I through 4, 

Lrt f ^ correlation among scores on tests 5 through 7, and the 
and between the ttvo sets, two factors are inferred, X 

tTnn nf T Operational identifica- 

tion of simple response-defined concepts. 

f- ^ ^ifnulus-variable elaboration, which is largely a 

nement of operational identification as far as S-R defined concepts 
^ different specific operations produce different 
^ ° 11^ g^en phenomenon, the inference is that the oper- 
amounts of a particular stimulus vari- 
behavior, or that 
variables are involved which influence behavior 
differenaally. At the empirical level, then, the task is one of sys- 
tematicaUy manipulating potential variables to see what influence 
t ey aye on t e phenomenon. Such manipulation will, of course, 
occur within the scope of the general operations defining the phe- 
nomenon. In the simplest case of the operational definition of a 
phenomenon, two different amounts of the critical variable are 
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getting away from the avowed principle of keeping the number of 
phenomena to be explained at a minimum. Nevertheless, we do not 
ignore differences for the sake of parsimony; we must note such 
differences when they occur, keep them conceptualized separately 
and, when an explanatory system is constructed, account for the 
differences in the system. 

Are hevel-2 concepts *^bettei^' than Level-^? If we limit our con- 
siderations solely to the formal operations, Level-2 and Level-j 
concepts are identical. The difference comes only in the wording 
(or implied wording) of the verbal report of the operations. It might 
seem that Level-3 concepts stray more from the operations than do 
Level-2 concepts and that this is “bad.” The straying comes about, 
ostensibly, in the naming of a cause for a reliable phenomenon. In 
the strict sense, with a Level-3 concept nothing more is being said 
than that a reliable phenomenon has a cause and the name is applied 
to the hypothetical cause rather than to the phenomenon per se. 
Since we maintain a deterministic position in science, there can be 
little “wrong” with saying a phenomenon has a cause and in giving 
this cause (although entirely uncharacterized in the simplest case) 


a name. 

If there is any danger involved it may stem from two sources. 
The first is that we may sometimes feel we have gone further 
toward explanation when using Levcl-j than when using ^ eve 
concepts. This is patently not the case, since the operauons ^ 
the same in both cases. That tve may think about the results of t 
Operations differently (thus leading to Level z versus eve 3 
not add one bit more explanatory validity to one than to the . 
The psychologist who defines his concepts at Leve 3, irresp 
the amLnt of elaboration (as discussed earl.er) 'v>uoh Ms taken 
place, is, at that point, in possession of no more exp P , 

than the psychologist who thinks at Level 2. If a psy ® „ 

think he Ms more explanatory power when using Level 3 
if he tends to delude himself that his explanation is 
than when he uses Level 2, then there » danger m using Level-J 

“ITond danger grows our of the to. I h-"/, ^ 

that vyhen a "onTept does no more than 

have IS a set of relationships. A Lcvci 3 t 
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words, the relationship between a specific stimulus variable and drive 
is said to be exactly the same as the relationship between this variable 
and behavior. Thus, the properties of the hypothetical state are 
directly inferred from and are completely isomorphic to the observed 
relationship between the manipulable stimulus and behavior. 

Refinements. As knowledge about a phenomenon grows, changes 
may take place in the scheme for conceptualization. The fint 
changes may be those indicated above, namely, operational 
identification and stimulus-variable elaboration. Both of these 



changes reflect greater inclusivcness. Now, however, changes may 
occur in the direction of less inclusivcness when data demand it. 
While several different specific sets of operations may be included 
under a general class of opera- 
tions (all said to be defining 
a given phenomenon), the 
lawful relationships between 
manipulable variables and the 
phenomenon may vary in some 
characteristic fashion for dif- 
ferent specific sets of opera- 
tions. Or, the influence of one 
phenomenon on another may 
differ as a function of specific 
. r, , , . of operations. Thus, the 

influence of thirst drive on a standardized task may differ appreci- 
ably from the influence of the hunger drive on the performance 
on the same task. Degree of interpolated learning may influence 
retroactive iiAibition on a verbal task in a certain way and in a 
somewhat different way on a motor task. If one is interested in 
exploring and explaining such differences (and as scientists we 
would be) the conceptualization would be kept separate. Thus, for 
hunger and thirst drive at Level 3, we would have two independent 
sets of relationships as shown in the diagram. 

With such a conceptualization, of course, we can have the 
stimulus-variable elaboration for each drive. If the data demand it, 
that is, if relationships differ appreciably, such independent concep- 
tualizations must be made. But, we should note that this should be 
done only when the data demand it, for we are in a serious sense 
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think the basis for the danger is having a damaging effect on our 
conceptual thinking. 

Now, if you ask me at this point what I mean by explanation I 
will simply plead that I must postpone further discussion of this 
issue until later. You may also insist that I am asking for something, 
namely explanation, which we actually have when we have a set of 
empirical relationships such as we indeed will have when elaboration 
of a concept has been far advanced. That is, you may ask me, what is 
there to explain at the psychological level over and beyond these 
relationships? At this time I can only say again that the same set 
of relationships will hold for a Level-i concept and we go about tty- 
ing to explain “something” at the psychological level, whereas for 
the same set of relationships for a concept defined at Level-3 we may 
not. 


level-4 concepts 

Fundamental characteristics. Whereas a Level-3 concept mao- 
duced through defining operations, a Level-4 concept is 
hy postulational procedure. In its most common form a L^‘-4 
concept is postulated to account for phenomena defined at Level 2. 
As a very simple case, assume that a j 

phenomeln, X, at Level 2. As^ime further that he ‘s ted m 

explaining this phenomenon beyond that given y t e , 
ation needed to produce the phenomenon and h=yo”h wh^ 
stimulus-variable Lboration may have occurred. Th' PSycho ogj t 
approaches the explanatory problem initially f , 

“I will postulate a^rocess (or state or capac.^, 
predilections and the phct'C'e'’®" ‘ , f L^vel-t 

which is the cause of X.” As indicated earlier in *e c^e of ^«1 3 

in the t\vo cases. , , nmress to ac- 

Why Level-4 concepts? 'f, pbenLenon there 

count for every independently d ,here would 

would obviously be no economy ^ ^ynting. in the 

be considerable redundancy m our P 
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funnel these relationships through a common term. If the Level-3 
concept is a psychological one (as opposed to a physiological one) 
it has no more locus or substantive existence than does gravity. Yet, 
it is apparent in the literature that the great bulk of psychologists 
think of Level-] as states or processes which have real existence in 
the body of the organism. Thus, 1 have continually used such terms 
in my previous discussion. For most of us it is difficult not to think 
in such terms. But, the danger is that these LeveI-3 concepts become 
spooks, or pixies or elves which have existence in one form or 
another. Actually, the spooks or pixies, properly sterilized, are 
probably not such a bad way to think about these processes; most 
of us disavow belief in the layman’s kind of spirits so for us they 
may perform only as aids to the thought processes. Some might say 
it would be more dangerous to think of the hypothetical processes 
in terms of physiological mechanisms which do have an aura of 
reality. Now, I would say immediately that such hypothetical 
physiological mechanisms may have value in directing a search for 
physiological correlates of the psychological relationships which led 
t e investigator to think in terms of physiological mechanisms. But, 

1 we are seeching for explanatory systems at the psychological 
level, there is a possible danger involved in thinking of Level-] 
concepts in terms of physiology. This is because the explanatory 
a at the psychological level may stop at this point, 

t-i, ore, I think the real danger of Level-3 concepts is that 
tney tend to stop explanatory attempts at the psychological level 
much more than do Level-z concepts. How many psychologists 
have attempted to explain drive through the use of psychological 
concepts, ow many have attempted to explain frustration? Com- 
pare the frequency of these attempts for Level-3 concepts with those 
or eve 2, sue as experimental extinction, reminiscence, and so on. 
eve 3 concepts impede explanatory attempts at the psychological 
level; Level-2 concepts invite them, yet both are formally based on 
exactly the same operations. I think this difference results almost 
exclusively from the fact that we tend to think that Level-3 concepts 
imply an existence of a real state or process in the organism and, 
t ere ore, w at more is there to explain at the psychological level. 
In short, I not only think there is a danger in Level-3 concepts but I 
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states which enter into accountings made 

, do not intend to imply that these 'i'td when * 

status; I mean only to imply a ‘"„,“'7some proc- 

inves^igator has made cenain observaoons and posmlates some pr 

ess to acount for these obserranons. illustrations 

A studied imolerance. Haying given s g ; , ^d 

of postulated concepts or ideas f ° of postu- 

names) I must now dismiss rather a P ^ , j j -ivished to 

lated concepts from ""hh -ge of t^ exposition I 

avoid criticism as much as possibl altosrcthcr. I wish to 

find right now that 1 elated processes which 

remove from further consideran ^nresent staue of develop- 

cannot he scientifically defended at our p process is to 

ment. I said earlier that the first purpose of^a^p ^ num- 
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bet of assumptions. To a greater second and 

illustrations of postulated processes can do this Bu, 

critical purpose of ^.P-t^„Sed pComeoa not used in 
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stimulating conditions and unle Editions there is no way to 

in some manner to the stimulanng Thus, the explanatory 

make a prediction from the , finding out whether it will 
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another. A postulated process " explain anything but the 
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initial stages of postulation of a Level-4 concept there is only one 
purpose which can justify the postulation. The scientist, in postulat- 
ing the process, brings under a single explanatory idea or principle 
several independently defined phenomena. That is, phenomena A, 
B, C, and Y are all said to be manifestations of the single process, X. 
We thus see that the scientist is, in such instances, adhering to the 
basic idea of explanation, namely, to account for the greatest num- 
ber of observations with the fewest number of assumptions. (I 
shall later discuss the fact that most explanatory attempts of this 
type actually involve two postulated processes or two independent 
components of a single process.) So then, the initial purpose of the 
postulation of a Level-4 concept is to bring several phenomena into 
a single explanatory orbit. 

Sovie illustrations of Level-4 concepts. At the present time I shall 
place no restrictions on the kind of postulated processes which I '^1 
use as illustrations. That is, I shall give these illustrations without any 
statement as to whether or not they might be considered scientifi- 
cally valid posculaced processes. With all restrictions removed, the 
number of postulated processes in psychology is legion. Because 
most of these are well known 1 shall indicate them by name only. 

I suspect that no single idea has been so often reflected in postu- 
lated processes as has chat of inhibition. The essential idea involved 
is almost always that of a dampening effect on certain types of 
behavior or a lowering of response potential. The idea of inhibition 
is old. Sherrington and Pavlov postulated ideas of a central inhibit 
tory state, the behavior they observed being largely directed at 
physiological or neurological mechanisms. We have Hull’s reactive 
inhibition, Kohler’s satiation, Ammons’ temporary voork decrement, 
Osgood’s reciprocal inhibition, Glanzer’s stimulus satiation and even 
Freud’s censor fits the general category. These postulated processes 
were not, of course, all involved in explanation of the same phenom- 
ena, but each was invented as a vehicle to bring several independent 
phenomena into the same explanatory system. ‘ 

Now for a few other illustrations not limited to ideas of inhibition. 
Hull’s anticipatory goal reaction, Lewin’s tension, Hebb’s cell assem- 
bly, Krech’s dynamic systems, Birch and Bitterman’s sensory integra- 
tion, Gofer and Foley’s mediated stimulus generalization, direction, 
as used by Maier, and on and on. These are postulated processes or 
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and usually do not stand in these isomorphic relationships; rather 
the characteristics assigned inidally arc those necessary to mediate 
(via deduction) the observed behavior. These characteristics are said 
to vary as some function of the stimulus variables but this same 
function does not hold bet\veen the hypothetical process and be- 
havior. The obvious implication of this is that the hypothetical proc- 
ess or state must in some way modify the input of a given stimulus 
variable. Furthermore, this modification must come about because 
cither another component of the hypothetical process is related by a 
different function to the same stimulus variable or a separately 
postulated process which enters into the particular response under 
consideration is so related. Thus, measured behavior results from 
the interaction of these two processes under specified observational 
conditions. And, of course, there may be more than one stimulus 
variable which is related to the hypothetical processes and more than 
one response measure related to the processes in a diffcrenml man- 
ner. Just what relationships are ascribed depend on the number and 
nature of phenomena to be incorporated. 

The hypothetical relations which are assigned between the stim- 
ulus variables and the hypothcdcal process are, within ® 

product of the scientist’s imagination. This imy be a somew a 
grandiose term for the scientist’s behavior. For, w at oes 
when he has some reliably established 
to bring under a simple explanatory system? What he 
pears, is to proceed through a series of trial-an -OTor, 
deductive circles. (Tf my relatively unroccess “ ® circles.) 

criterion, it is a very agonizing senes of induenv - , . , j.___ 

The scientist has a set of facts; he must assign b . deduce 
ess (or processes) characteristics which nail 
these face. The characteristics he assigns the 
lated in some fashion to relevant stimulus 

two facts he may induce certain charact=nst.cs ^ >-f/ttv ‘Tn 
processes must have; he then examines ot er a again, 

be deduced from the assigned characteristics. 

this time assigning different his facts. All the data 

the proper combination, ^ 

he has must be represented m the yp , 

directly or in terms of being deductive y g 
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variable. This may be a gross verbal statement, i.e., X increases as 
some function of the stimulus, or it may be a precise mathematical 
statement, i.e., X = Si* X 82“*®“*. All postulated processes involve a 
statement of a relationship between the process and behavior if no 
more than to say that X caused the behavior. When the postulated 
process or state is not tied to stimulus conditions we are in serious 
danger of developing spooks or pixies over which we have no scien- 
tific control. The spooks multiply, divide, excite, repress or inhibit in 
a manner dictated by observations of behavior but without reference 
to stimulus control. Some of the Freudian concepts, such as libido 
and id are of this nature, as is also, in my opinion, Hull’s oscillation. 
These ideas might be brought under experimental scrutiny by stat- 
ing what conditions cause them to vary in amount but it is a fact 
that such statements have not been made. At least in our stage of 
development we are not equipped to cope with such concepts. 
Therefore, I am omitting them from further consideration at this 
time and shall proceed with further characterization of what I will 
call acceptable Level-4 concepts. 

The grooy/Aj of Level-4 concepts. As noted, the scientist initially 
has reliable phenomena before him when he postulates a Levcl-4 
concept to symbolize processes which he will use in an attempt at 
explanation. Furthermore, for any given phenomenon, some stim- 
ulus-variable elaboration has usually taken place, i.e., the phenom- 
enon is known to vary in certain ways when certain stimulus con- 
ditions are manipulated. Now, the postulated process does not remain 
conceptually amorphous; it is assigned certain characteristics. These 
characteristics are related (co-vary in some fashion) with the stim- 
ulus manipulations and, in turn, the response (behavior) is related to 
variation in the hypothetical process. 

Let me draw a somewhat sharper contrast than probably exists 
between Level-3 Level-4 concepts. A Level-3 concept (which is 
also representative of a hypothetical process) is completely faithful 
to its defining components, that is, to the stimulus manipulations and 
to behavior. It transmits perfectly and directly. If a response is 
shown to be an exponential function of a stimulus variable, the 
hypothetical process of a Level-3 concept is said to be this same 
function of the stimulus variable and the response the same function 
of the hypothetical process. In contrast, Level-4 processes need not 
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and usually do not stand in these isomorphic relationships; rather 
the characteristics assigned initially are those necessary to mediate 
(via deduction) the observed behavior. These characteristics are said 
to vary as some function of the stimulus variables but this same 
function does not hold between the hypothetical process and be- 
havior, The obvious implication of this is that the hypothetical proc- 
ess or state must in some way modify the input of a given stimulus 
variable. Furthermore, this modification must come about because 
either another component of the hypothetical process is related by a 
different function to the same stimulus variable or a separately 
postulated process which enters into the particular response under 
consideration is so related. Thus, measured behavior results from 
the interaction of these two processes under specified observational 
conditions. And, of course, there may be more than one stimulus 
variable which is related to the hypothetical processes and more than 
one response measure related to the processes in a differently man- 
ner. Just what relationships are ascribed depend on the number an 
nature of phenomena to be incorporated. 

The hypothetical relations which are assigned between the stim- 
ulus variables and the hypothetical process are, within limits, a 
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The initial success of the scientist's efforts depend solely on how 
many of the available data he can successfully incorporate. But, al- 
most immediately we would also gauge his success on how simply 
the incorporation takes place. His assigned characteristics cannot be 
so numerous, and his interactions so complex that for each set of 
data a special characteristic is required. In the long run the principles 
used in the accounting must be fewer than the facts for which they 
account. 

Let us suppose now that the scientist has assigned properties to his 
hypothetical process or processes and that on the basis of a few such 
properties he can account for an appreciable range of facts. I for 
one would not minimize such an achievement but at the same time I 
would insist that he has accomplished only the first stage of scien- 
tific explanation. Since properties were assigned to the process on 
the basis of the facts to be explained they must incorporate those 
facts; they were assigned so they would incorporate them. The 
scientist must now direct his attention to the predictive capacity of 
his assigned characteristics. Do these characteristics represent prin- 
ciples with greater generality than those evident in the limited data 
from which they were derived? What “undiscovered phenomena” 
can be predicted by the principles? The principles may predict (a) 
known phenomena not used to induce the principles, or (*) new 
p enomena (new relationships) which have never been investigated, 
in both cases the scientist reasons pretty ^luch as follows: “If the 
characteristics I have assigned these processes (or components of a 
single process) are valid, then it must be predictable (deducible) 
that if such and such is done, such and such will happen.” If this 
what will happen” is independent operationally from those phe- 
nomena from which the assigned characteristics were induced in 
the first place, we have the gratifying experience of a theory pre- 

icting, an if the prediction is tested by research and confirmed, we 
have an even more gratifying experience. But, aside from a con- 
SI eration of the hedonic state of the investigator we can see that 
progress is being made in incorporating an expanding body of em- 
pirical relationships under a few basic principles. Theoretical notions 
-the hypothetical processes-constituting a system cannot be closed 
or sterile. It must have provisions for reaching out and encompassing 
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new facts in an expanding science. If it does not do this it will sooner 
or later be replaced by other notions which will do it. 

It should be noted that in contrast to Level-4 concepts, Level-3 
concepts per se have no predictive power over and above the specific 
relations which led to their definition. Only if the investigator 
postulates an interaction of a specific kind with another process does 
the Level-3 concept take on an aura similar to the LeveI-4 concept. 

Having set forth the growth of a Level-4 concept in somewhat 
abstract, perhaps idealized form, we should turn to a concrete 
illustration. 

A detailed illustration of a Levet-4 concept. For an illustration I 
will use the concept of reactive inhibition. It is a fairly familiar 
concept and, therefore, I may hope for some transfer. In discussing 
the concept I shall also make use of another concept, excitation, 
which interacts with reactive inhibition in determining measured 
behavior. I will not, however, set forth a detailed account of the 
history of reactive inhibition. What I wish to show is how the 
apparent necessity for such a concept arose and how it was said to 
interact with the excitation process. 1 will then indicate how these 
conceptions led to the incorporation of phenomena beyond those 
which suggested the processes and their characteristics. I shall take 
the liberty of simplifying certain issues and of trying to reconstruct 
the thought processes of the scientist. Finally, by way of introducing 
this illustration, let me say that I am fully aware that it suffers certain 
shortcomings in being unable to account for some facts and that 
alternative conceptions have been offered. I am not particularly 
concerned with the relative worth of the alternative theoretical 
conceptions but with the nature of this one conception as an illus- 
tration of postulational thinking. Reactive inhibition is Hull’s term, 
and my illustration follows Hull, but it should be recognized that 
there were others before Hull who used essentially the same ideas, 
notably Pavlov. 

Two empirical phenomena led to the idea of an interaction be- 
tween an excitatory process and an inhibitory process. These phe- 
nomena are experimental extinction and spontaneous recovery of 
conditioned responses. Experimental extinction is the decrement in 
performance following removal of the unconditioned stimulus, and 
spontaneous recov'cry is the increment in performance with the pas- 
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sage of time following extinction. Now, if we put ourselves in the 
theorist’s position, wanting to unify these two phenomena, what 
might we suggest? The fact that learning must necessarily take place 
before extinction indicates that we have to have some process which 
produces an increment in response strength, i.e., the excitatory com- 
ponent. But, the fact that during extinction the organism responds 
with less and less magnitude (or frequency, or latency) offers pos- 
sibilities for choice. In simple form, one could conceive of an actual 
decrease of the excitatory strength during extinction; or, one might 
suggest that another process is masking the effect of the excitatory 
component; or, some combination might be involved. For simplicity 
and mediating power, the alternative chosen was that of saying that 
the excitatory strength does not change during extinction; rather, 
reactive inhibition is built up during extinction thus masking the 
influence of the excitation. More specifically, it was asserted that 
every time the organism makes a response a certain amount of re- 
active inhibition is generated. With such a statement, of course, it 
means that reactive inhibition is not limited to extinction; it occurs 
anytime an organism makes a response. Furthermore, the amount of 
inhibition generated by a response was specified in terms of amount 
of energy or work required to make the response. Spontaneous 
recovery would suggest to the theorist (or so it seems in retrospect) 
that the inhibition which develops must disappear with passage of 
time. Since the excitation does not change with time, and reactive 
inhibition does dissipate with time, the passage of time following 
extinction would leave the excitation component relatively stronger. 
Thus, on the basis of the two phenomena, the ideas of excitation 
and reactive inhibition interaction were developed. Now let us see 
what was done up to this point. 

The thought processes of the scientist were inductive initially. 
There was a certain set of facts available which, in his way of think- 
ing, required the postulation of two interacting processes to account 
for them. Each process is assigned certain characteristics and a state- 
ment is made of how the processes interact. The characteristics as- 
signed each process and the interactive idea are entirely those 
demanded by the data available. The characteristics assigned are not 
pulled out of the blue; they are assigned because they will 
“account” for the obtained results. Other combinations of character- 
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istics might have served equally well or better, at least initially. 
When we say, therefore, at this point, that our postulated character- 
istics account for the observed facts we realize we are being com- 
pletely circular since we have assigned those characteristics because 
they 'would account for the facts. I think most would agree that 
explanatory attempts should not stop at this point. 

We must look at the implications of the characteristics assigned; 
in so doing we arrive at tlie clear deductive characteristic of our 
Level-4 concepts. For we ask, if these processes “really” have the 
characteristics assigned, what new phenomena may be predicted? 
A little consideration of the postulated characteristics will show that 
the following may be predicted (to sample a few): distributed 
practice should give better performance than massed during the 
acquisition of a conditioned response; massed practice should pro- 
duce more rapid extinction than distributed practice; diUerences in 
rate of spontaneous recovery should occur as a function of number 
of extinction trials and as a function of amount of work during ex- 
tinction. Certain predictions can be made concerning alternation 
behavior. The concepts have been used pretty much as given here 
in explaining certain rotc-Ieaming facts; they have been found useful 
in accounting for certain facts of motor learning. Thus, a few basic 
ideas have been shown to be capable of bringing a large number of 
rather diverse facts together. Let me assert again that at this point 
I am not interested in the adequacy of this formulation as compared 
with others; it is the attempt that I am concerned with, since it 
illustrates well the nature of considerable theorizing in psychology 
which meets our general criterion of attempting to integrate a large 
number of empirical relationships by a few basic notions. 

Most explanatory attempts, such as the one briefly outlined above, 
are usually prefaced by an insistence of the scientist that it is a tenta- 
tive formulation. Failure of prediction of one or more relationships 
may result in modification or abandonment of the ideas, although the 
latter is not done lightly. Abandonment of theoretical efforts is not 
easily accepted, probably because they always have a certain amount 
of predictive power, even though it be incomplete. Theoretical ideas 
seem to be lost only when a better set of ideas is available to replace 
them. 
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Are Level-4, concepts defined? Level-4 concepts arc not defined in 
the sense that Level-i, -2, and -3 concepts are. One does not oper- 
ationally define a postulated process; one defines a phenomenon, 
albeit the name may be given to the uncharacterized hypothetical 
process \Yhich is said to cause the phenomenon (Level 3). How, 
then, do we transmit meaning for LeveI-4 concepts if not by defini- 
tion? Scientific meaning is given to Level-4 concepts by relating them 
to stimulus and response variables. 1 think Marx’s term of operatio7ial 
validity is particularly appropriate fof this situation (13). The mean- 
ing of the concept is given by specifying in what relation it stands to 
at least one stimulus variable and at least one response variable. These 
are the minimum requirements demanded for the operational validity 
of a scientific concept. That is, if the proposed relationships can be 
put to empirical test directly or indirectly, the concept has oper- 
ational validity. This testability primarily obtains when the concept 
is tied to stimulus and response variables. Note, in contrast, that one 
does not put to empirical test an operational definition; an oper- 
ational definition, one based on acceptable scientific procedure, is not 
open to question and as it stands it has no deductive consequences. So, 
we define Level-4 concepts only in the sense that we characterize 
them by making statements of their relation to stimulus and response 
variables, and to other concepts, and that is all we do. 

Are such concepts unique to psychology? The other sciences have 
long made use of concepts having the essential characteristics of 
Level-4 concepts. Level-4 concepts summarize postulated relation- 
ships, although in many cases a particular name is not assigned the 
relationship. But, there are many such names, such as atoms, mol- 
ecules, and genes which were originally postulated in the manner 
of a Level-4 concept. There is the postulated characteristic of light 
energy as corpuscular and an opposing conception of wave-like 
action. All such notions summarize certain observations and lead, 
when combined with other concepts, to the prediction of certain 
relationships which must obtain if the assigned characteristics are 
valid. Indeed, it seems that postulational behavior of the scientist 
must essentially be what is usually meant by theorizing. 

It should be noted that in the other sciences the postulation of 
processes or entities akin to LeveI-4 concepts has sometimes led 
to observations which confirmed the existence of the process or 
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entity. Thus, while the gene was originally a postulated entity, later 
and more refined observations led to the discovery of a structural 
entity. Some philosophers of science (e.g., 2) and some psychol- 
ogists (e.g., 11) believe this to be one of the primary functions of 
such concepts. That is, the characterization of the processes or en- 
tities through inferences resulting from experimentally derived 
relationships may lead investigators to search for structures or proc- 
esses which were originally built up as convenient but useful fictions. 
This might not seem on the surface to be an issue of much moment; 
bur, if we ask whether the postulated process should or should not 
have possibilities of being itself discoverable we find that consider- 
able heat has been generated on the issue with regard to theorizing in 
psychology. It will be necessary to return to this problem later if 
we are to succeed in reflecting the temper of contemporary psy- 
chology. 


I shall not dwell long on these Level-5 concepts. Perhaps 
sense they should be included as a special case of Level-4 concepts. 
Though they are infrequently used by theorists today and though 
they share certain characteristics with Lcvel-4 concepts, I should 
like to keep them separate for the sake of completeness. Essentially 
these concepts are general stemmarizing concepts; they summarize 
the interaction of other postulated processes in an explanatory sys- 
tem. Suppose that X and Y arc Lcvel-4 concepts which are said 
to summate to produce the measured response. We might then add 
s ^ ^ jairam.-iriziig^ jwem .and .the .eeyinase -b -said 

to be produced by the process indicated by the LeveI-5 concept. 
This, of course, would be the simplest possible case and might seem 
to introduce a redundant concept. But, when several concepts enter 
into a system several LevcI-5 concepcs may appear and they may 
help in simplifying the conceptual problem resulting from the inter- 
actions of several postulated processes. 

I believe the best iUustrations of Levcl-5 concepts are given in 
Hull’s work (7). For example, reaction potential, reactive inhibition, 
and conditioned inhibition combine to produce effective reaction 
potential, this latter term therefore being a Lcvel-5 concept. But, in 
turn, effective reaction potential combines with a hypothetical 
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oscillatory function (Level 4) to produce momentar)' effective re- 
action potential. Thus, we have combinations of combinations at 
Level 5 . 1 shall spend no more time on these concepts since at the 
present time they occur in a very small proportion of explanatory 
attempts. 

FURTHER COMMENTS ABOUT THE FIVE LEVELS 

In reviewing the five levels, let me first repeat some statements 
made early in the chapter. I am not deceived that the levels are all- 
inclusive; nor do I feel that it is easy to differentiate the concepts to 
be included in each. I have found concepts in the literature which 
I could not discriminate satisfactorily as belonging to only one of 
the levels. These are cases in which the status of a concept cannot be 
determined; one cannot tell how it was introduced, for what pur- 
pose, and its relationship (if any) to other concepts. And, I have 
indicated that the same word (summarizing an idea or notion) may 
be used in different ways (at different levels) by different writers. 
Yet, there must be some way by which we can sharpen our thinking 
about concepts used in explanations of behavior and the present is 
one such attempt. I hope, furthermore, that by using the differences 
discussed in this chapter and the further differences to follow we 
may find it possible to arrive at a fair understanding of both the 
formal status of the concept and possibly some understanding of the 
thought processes of the scientist as he introduces and uses the 
concept. But, I see that I may be expecting too much from the 
written word. 

Let me also point out that as thinking develops around a concept 
its status may change. The levels I have outlined in, this chapter are 
concerned with the initial status of a concept; that is, how it was 
first introduced by the scientist. There is stability of “placement at 
a level” only so long as the concept continues to be used in the same 
manner in which it was first introduced. Inevitably, however, as 
explanatory attempts grow, there is an elaboration of concepts, espe- 
cially toward stating of new relationships with other concepts.' I 
think this is especially true with Level-3 concepts. Introduced origin- 
ally by a definitional procedure they may soon be related to LeveI-4 
concepts and indeed, take on the aura of Level-4 concepts. Their 
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initial formal status is often obscured by these relationships. Thus, 
while the Level-3 concept is introduced by definition (not postula- 
tion) it may soon be said-be guessed or postulated-to be related to 
other processes in such and such a manner. For example, drive, 
introduced by definition, may be postulated to interact in such and 
such a way with associative strength of a habit, or may be said to 
interact in such and such a manner with other independently defined 
drives. It is fact that this is fairly common practice in psychology 
today. Such an elaboration, of course, makes them more than repre- 
sentations of defined phenomena; they become part of an explanatory 
attempt having the objective of greater inclusiveness. Although I 
have been somewhat critical of Level-3 concepts (as opposed to 
Level-2) their saving grace may He in the apparent fact that the 
scientist finds it easy to postulate relationships with other processes 
when the cause is introduced as a part of the defining statement. 

What other distinctions have been made? I would be misrepresent- 
ing the situation if I did not make it plain that many other writers 
have faced the problems of concept differentiation and have each in 
his own way made certain distinctions. 1 shall sample some of the 
contemporary ones and indicate where they appear to fit into the 
scheme developed in this chapter. 

Taken as a group, concepts at Levels 3, 4, and 5 have been tradi- 
tionally known as intervening variables. A distinction between inter- 
vening variables and hypothetical constructs made by MacCorquo- 
dale and Meehl {ii) has suggestions of the distinction made here 
between Levels 3 and 4. MacKinnon’s {la) phenomenal concept is 
similar to Level 3 when operationally elaborated and his fictional 
concept similar to Level 4. If I understand O’Neil (tj), his hypo- 
thetical relations are very nearly the same as Level-2 concepts. His 
wicharacterized hypothetical term might be a Level 3 or a Level 4 
at the time the latter first germinates in the scientist’s mind. His 
characterized hypothetical term is clearly of the Level-4 type as 
discussed in this chapter. I should also mention in passing that the 
diversity of terms is not indigenous to psychologists as can be seen 
if one turns to the writings of philosophers of science. We all seem 
to be beset by the plague of individualism in our language. I must 
indeed apologize for adding more terms to this collection; my only 
defense (and, common as It is, it is a weak one) is that I found 
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myself quite incapable of organizing my work around the distinc- 
tions which have previously been made. 


REDUCTIVE VERSUS NONREDUCTIVE CONCEPTS 

As the term is most commonly used in psychology, reductionism 
means explanation of behavior by means of physiological or neuro- 
logical concepts. More pointedly, in the context of this chapter, if 
reductive concepts were introduced at Levels 3, 4, or 5, they would 
refer to more or less specific neurophysiological mechanisms. And so, 
presumably, we have another dimension on which concepts may 
differ, namely, neurophysiological versus what I shall call strictly 
psychological or behavioral concepts. It may be surprising that 
rather strong opinions prevail among contemporary psychologists 
on this issue. But they do, and to discover the essentials of cur- 
rent thinking about the place of physiology, if any, among 
explanatory attempts in psychology I must take a brief foray into 
this arena. 

What is the controversial issue? I suppose it could be said that 
there are several controversial issues but they all essentially cling 
around such questions as the following. Should or should not ex- 
planatory attempts of psychological events be at the neurophysio- 
logical level? Do we have better or “truer” explanation when we 
use neurophysiological concepts? Should we “require” explanatory 
concepts to be neurophysiological concepts? Is it possible to have 
high-order generalizations as axioms in an explanatory system of 
behavior unless these generalizations are at the neurophysiological 
level? Why stop at the neurophysiological level; why not the 
chemical or biochemical or atomic, and so on? 

Let me suggest that it is no simple matter to distinguish clearly 
between psychological and neurophysiological concepts. It is a fact 
that behavioral events are sometimes used to define and infer physio- 
logical phenomena; and so also neurophysiological events are used 
to infer behavioral phenomena. Therefore, defining operations offer 
little by way of differentiating disciplines. It seems to me a simple 
fact that we have difficulty telling just whether a given explanatory 
concept is physiological or neurophysiological. As a rough means of 
describing the nature of concepts which may be introduced as 
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explanatory concepts by psychologists, I will indicate a complex 
continuum at one end of which are located the strictly psycho- 
logical or behavioral concepts while at the other end are the strictly 
neurophysiological concepts. 

1. Strictly psychological or behavioral. A concept of this type 
does not have any neurophysiological implications, is not a term that 
is ever used at the neurophysiological level and the originator of the 
term gives no indication that he was thinking in neurophysiological 
ideas when he introduced the term. I think terms like frustration, 
reminiscence, super-ego, and morale fit this end of the dimension. 

2. There are concepts which are neurophysiological “sounding” 
(ie,, msy have been used by neurophysichgiscs) hut no evidence is 
available that the originator (or subsequent users of the term) was 
thinking in a neurophysiological manner. Memory trace is a concept 
which has appeared for many years in psychological literature which 
fits this area on the dimension. So also does engram. Refractory 
phase has been occasionally used as an explanatory concept at the 
psychological level but in a way in which it is quite clear that it is 
no more than a rough analogy to refractory phase of the neuron. 

3. Concepts that may or may not be neurophysiological “sound- 
ing,” whose usefulness does not depend on neurophysiological 
facts, but the originator (or subsequent users) made it clear that he 
was thinking of possible neurophysiological counterparts or physio- 
logical mechanisms that might lie behind the behavioral process or 
state. Many of Hull's concepts are of this nature, e.g., stimulus trace. 
Hull’s copious notes leave little doubt that he was continually refer- 
ring his ideas to the physiological level. Yet, as many have pointed 
out (almost apologetically) the evaluation of the theoretical ideas 
fostered by Hull does not depend one bit on references to the 
physiological data; the system is evaluated at the psychological or 
behavioral level even though Hull may have found it useful and 
intriguing to speculate about the neurophysiological counterparts 
of the relationships at the psychological level. 

4. Strictly neurophysiological. In these cases either real or postu- 
lated neurophysiological mechanisms are used in explanatory at- 
tempts. Thus, theories of vision inevitably make reference to the 
physiological or perhaps chemical processes. The term has complete 
neurophysiological implication, it is commonly used at the ncuro- 
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physiological level or an offshoot of this, and the user leaves no 
doubt that he intends his concept to refer to neurophysiological 
processes. Kohler’s electrical fieJdSy Hebb’s phase sequences^ Krech’s 
dynamic systems are all concepts which fir at this end of the con- 
tinuum. 

I suspect that the current spate of rather caustic expressions on 
neurophysiological versus psychological theorizing was triggered by 
MacCorquodale and Afeehl (//) in a rather unwitting fashion. In 
closing an'analysis of differences among certain kinds of concepts 
in psychology these writers suggested that theorists must be more 
concerned with the reality status of postulated processes or entities. 
More specifically they said that concepts such as Level-4 concepts 
should represent entities or processes that “have some probability of 
being in correspondence with the actual events underlying the be- 
havioral phenomenon, i.e., that the assertions about hypothetical 
constructs be true” (//, p. 105). What they mean by true is that 
the construct “should not be manifestly unreal in the sense that it 
assumes inner events that cannot conceivably occur” (p. 105). 

While these writings by MacCorquodale and Meehl do not sug- 
gest quite as blunt an issue as I indicated by my introduction, it is 
clear from the context that these writers feel that in postulating 
processes the theorist should pay some attention to neurophysio- 
logical facts in assigning the properties to postulated processes. Yet, 
they realize that one might assign properties to a hypothetical proc- 
ess which at a later date would find correspondence— indeed even 
aid in finding correspondence— in neurophysiology. Their major 
plea is not to assign properties that are contrary to known neuro- 
physiological facts or that seem highly improbable. These writers 
would seem to accept theorizing at the psychological level and if 
knowledge of corresponding neurophysiological mechanisms is lack- 
ing, It is quite possible that such theorizing would lead to neurophysi- 
ological research in an attempt to discover mechanisms mediating the 
behavioral process. Thus, it seems to me that all things considered 
this was a rather subdued plea for psychologists to show greater con- 
cern with neurophysiology when theorizing. It remained for Krech 
to take a completely positive position on this issue. I think Krech’s 
words are worth quoting in order that his position can be clearly 
understood. 
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But the moment we introduce hypothetical constructs into our theory 
building, then the purely psychological approach becomes untenable. 
I have argued that it is untenable because it makes forever impossible 
any attempt to approach the study of our hypothetical constructs in 
any more direct manner than through the examinations of the original 
stimulus-response correlations. This is so . . . because the psychological 
position places hypothetical constructs in a domain which, by defini- 
tiotif is forever removed from any direct observation (for that domain 
...is neither behavioral, experimental or neurological) (/o, pp. 287- 
288). 

Where, then, can we place our hypothetical constructs and what 
can their nature be? The answer I have come to, on the basis of all of 
the above considerations, is a simple one and one which is not at all 
new. It seems to me that the most fruitful thing to do would be to take 
the plunge and announce that henceforth our hypothetical constructs 
(through the use of which we hope to understand all behavior and 
experience) are to be conceived of as molar neurological events-that 
and nothing more. Such a step amounts to accepting the universe, and 
such a step may help us to avoid some of the confusion, esotericisms 
and circular reasonings chat we all have been guilty of in times past. 
Once having made such a step we would then be in a position to 
manipulate hypothetical constructs, to have phantasies about the in- 
trinsic attributes of these hypothetical constructs and, on the basis of 
such hunches, to look for new relationships among the primary data of 
psychology. And what is most significant, such a step will permit us 
at least to entenain the hope of eventually being able to study our 
hypothetical constructs more directly than through guess and hunch 
(p. 288). 

Thus we see that Krech takes quite a dim view of our phantasies at 
the psychological level of theorizing but would admit, indeed recom- 
mend, that such uninhibited phantasizing continue as long as it is 
directed toward the neurophysiological level. I suspect that the 
general point of view that we should move rapidly toward theories 
of behavior based upon neurophysiological concepts has its most 
reasoned contemporary impetus from th^ scholarly book by Hebb, 
published in 1949 (0* in 1939 Pmtt (t6) made a very strong 
plea for a physiological language of explanation. But, because of the 
Zeitgeist or because Pratt said so many other things that aroused 
controversy so as to obscure this particular stand, the Hcbb book 
stands as a more prominent contemporary landmark. Hcbb frankly 
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sets out to account for certain behavioral phenomena on the basis of 
neurophysiology of the nervous system. He takes a little bit of 
neurophysiological fact here, a little there, adds some clearly labelled 
speculation and arrives at an accounting of some behavioral facts. 

Farrell (j) in examining the basic explanatory generalizations in 
some other sciences notes that they arc usually small units, e.g., cells, 
molecules, atoms, chromosomes. When he then asks himself what 
comparable units could be obtained at the strictly behavioral level in 
psychology he reaches an impasse. He then, by logic which is quite 
unclear to me, concludes, therefore, that psychology must look to 
neurophysiology for its fundamental generalizations. Experimental 
research at the behavioral level should continue, he thinks, with the 
aim of arriving at basic generalizations but with the intent of clearly 
outlining the behavioral laws for which the neurophysiological 
mechanisms must account. 

And so with samples of one point of view before us, let us look 
at what is said on the other side. Some quotes from Kessen and 
Kimble directed specifically at Krcch will demonstrate the tenor of 
this side. 

We object to his [Krech’sJ premise— that psychological concepts must 
be neuroloigcal— and to the sort of theorizing to which this conviction 
leads him. Even more, we oppose the assertion that psychological 
theory can progress only when we are willing to indulge in neuro- 
logical speculation. In direct contrast, we hold that there is nothing 
intrinsically more fruitful in physiological theory than in any other 
kind; further that what Krech calls purely psychological theory is 
actually in a stronger position insofar as it remains uncluttered by an 
anachronistic search for “reality” and “true essences” (^, p. 263). 

. . . constructs have no more location than the physicist’s concept 
of force (p. 263). 

... theoretical constructs [are] designed to aid in predicting be- 
havior. The extent to which they accomplish this end is a measure of 
their value, ’ from which their lack of “neurologicity” subtracts 
exactly nothing (p. 263). 

Our version of the purely psychological psychologist is the scientist 
who erects his theory and develops his concepts so that the deduced 
theorems can be confirmed or disproved by observations of behavior. 
This we demand of him, and nothing more. The symbols he uses for 
theoretical manipulation may have any flavor he likes-neurological, 
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physical, sociological, acstheuc-but such a psychologist is not required 
to specify locus or “real" nature in his theory so long as his concepts 
mediate the prediction of behavior (pp. 263-264). 

Adams (/) first takes Krech to task and then iater aims his pungent 
remarks at MacCorquodale and Meehl. Like Kessen and Kimble, 
Adams points out chat the anstver to Krech’s petulant inquiry as to 
where hypothetical constructs exist, is: “In exactly the same ‘physi- 
cal’ world or Nature as the atom or electron" (p. 68). To Adams, 
Krech’s question is puzzling but m no sense muzzling. He will not 
see science hamstrung by any such set of regulations as Krech sug- 
gests: 

When Krech speaks of “a proper respect for present neurological 
knowledge and theory” ... 1 think he is dead wrong, in spite of his com- 
prehensiveness and important qualifications. The only things to which 
an inquiry owes respect are its phenomena. The attitude of respect on 
the part of an empirical science is never appropriate toward existing 
principles of its own or any other field of inquiry. You break out 
of the bonds of a doctrine and enlarge it only by not having respect 
for it. We are inherently conservative enough without submitting to 
such restrictions (p. 69). 

And to the MacCorquodale-Meehl suggestion that we should not 
admit hypothetical constructs which “require the existence of en- 
tities and the occurrences of processes which cannot be seriously 
believed because of other knowledge” (//, p. 106), Adams replies; 

. . . when a notion shows a good deal of versatility and seems to be 
applicable to a variety of phenomena beyond that for which it was 
designed, it becomes a valued construct, irrespective of the immediate 
plausibility of the mechanisms it envisages (p. 73, italics omitted). 

Finally, with regard to Adams, 1 should note that he gives Hebb 
pretty much the same treatment as he has the others. Adams’ position 
on the issue seems fairly clear. 

Bergmann has dissected the assertions of MacCorquodale and 
Meehl and of Krech and has found them wanting philosophically. 

He notes that “logically and in principle, physiological reduction is 
a certainty" (j, p. 442). Bur that this is true does not in any way 
eject nonphysiological notions from theories. Relations and proper- 
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ties (states), the stuff of which theories are made, do not literally 
occupy space and yet are as real in a scientific and philosophical 
sense as a nerve or a piece of steel. 

I think this is enough on this issue (or is it really an issue? ) . Others 
have spoken out at various times and if one wishes to pursue the 
matter further among the writings of psychologists I would suggest 
Pratt (i6), Marx (i^), MacKinnon (/^)» and Davis (4). I have 
indicated four areas along a rather complex dimension which might 
be used as reference points when evaluating the status of a concept. 
I have also shown that where along this continuum concepts should 
be is a matter of considerable argument. I am afraid that this is one 
issue at least where I must take a position that someone will describe 
as vapid eclecticism. But, we are interested in understanding be- 
havior and we should reject nothing which furthers that under- 
standing. Psychology has and will continue to have (in increasing 
numbers, I believe) physiological concepts in its theories and 
psychological concepts in its theories. At the present time physio- 
logical or mechanical concepts are tised almost completely in Aeories 
built around the auditory and visual processes. Examine almost any 
explanatory attempts of the sensory processes and one finds a heavy 
neuro-physio-mechanical component. In the areas in which em- 
pirical knowledge is not so fully developed we may well expect less 
emphasis on the neurophysiological level. Whether we would be 
better off in the long run to junk all our psychological constructs 
and go to the neurophysiological level (as remote as it may seem 
for many phenomena) is not a matter for group decision. I am sure, 
and would hope, that we can have theorizing at all levels of dis- 
course. As the sciences slowly unify through overlapping concepts 
we may see a concomitantly gradual but slow progress toward 
neuro-physiological-chemical-atomic reduction. But this, if true, 
represents a later outcome of scientific endeavors; it does not pre- 
scribe what the efforts of any scientist shall be at ^e moment. 
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The Nature of Some Explanatory 
Attempts 

INTRODUCTION 


Ihe previous chapter, while in- 
tended to examine some differences among concepts and thus help 
us to understand certain intellectual activities of the scientist, neces- 
sarily introduced some preliminary norions of explanation. The 
present chapter has as its goal a deepening of our comprehension of 
the explanatory attempts prevalent in psychology. I have made some 
distinctions among concepts; these distinctions were intended to 
apply to a concept when it was first introduced by the scientist. I 
also indicated that there may be immigration of concepts from one 
level to another as explanatory attempts grow. An explanatory idea 
is rarely introduced and maintained unaltered. Even in its most 
strictly abstract form it may grow in assigned properties as it reaches 
out to encompass a greater range of empirical phenomena than was 
originally intended. Or, it may shrink in role as other concepts re- 
place some of its functions. So also we find even our most empirical 
concepts incorporated into explanatory attempts so that they too 
may acquire attributed properties or characteristics beyond those 
indicated by the defining operations. It is a by-product of the pres- 
ent chapter to show the nonstatic character of concepts in the hands 
of theoreticians. 

What is explanation? I think I have quite deftly skirted this direct 
question thus far but this source of ambiguity cannot be tolerated 
any longer. To answer it as best I can for our purposes, I shall 
simply make some bald assertions. Within the vision of any one 
scientist explanation has no end. This is true whether we think of as 
*34 
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yet undiscovered phenomena •wdiich must be explained or which will 
serve to explain, or whether we think in terms of ultimate reduction 
through the pyramid of the sciences even as we now know them. So, 
what can I assert? I can assert that scientists engage in activities the 
outcome of which does two things. First, these activities reduce the 
number of independent phenomena which require explanation. 
Secondly, they reduce the number of assumptions necessary to 
deduce or account for known phenomena. Thus, although we cannot 
tell when we arrive at the ultimate of explanation, namely, an ac- 
counting of all phenomena with the fewest possible assumptions, we 
can, within the orientation implied by the goal, recognize activities 
that are commensurate with the orientation. It is my purpose here 
to sample some of these activities. 

• One rough dimension which reflects differences among many ex- 
planatory attempts is the number of postulated processes (Level 4) 
involved in the attempt. Perhaps it would be better to say that the 
dimension represents the ratio between the number of postulated 
processes and number of empirical phenomena (Levels 2 or 3) which 
enter into the explanatory attempt. As in all sectioning attempts 
this one will provide only crude distinctions but it will be satis- 
factory for my purpose. Therefore, the major part of the chapter 
will be divided into three sections, namel/y empirical explanation, 
mixed empirical-p>ostulational explanation, and postulational explana- 
tion. Within these sections there will also be some other differences 
which I will point out as we go along. Following these three sections 
I will discuss some other kinds of explanatory attempts which do not 
easily fit along the rough dimension noted above. Finally, I find it 
necessary to make a number of general remarks in order to complete 
these three chapters on theory or explanation. 

EMPIRICAL EXPLANATION 

I suspect that the two wotds, “empirical explanation,” will to 
many seem contradictory. Yet, within the broad limits I have set 
for trying to understand explanatory attempts in psychology, many 
activities may be described by these two words. In the pre;’ious 
chapter, I discussed the idea of operational identification. In its most 
simple and direct form, operational identification is the minimum 
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activity which I allow as a form of empirical explanation. But, there 
are less obvious forms of empirical explanation which also must be 
discussed. As a preview, let me note three possible outcomes of 
empirical explanation, which, while not independent of each other, 
ought to be noted separately. 

First of all, empirical explanation keeps the number of independ- 
ent phenomena requiring explanation to a minimum. Secondly, as 
an outgrowth of persistent attempts at empirical explanation, a given 
phenomenon may acquire a great deal of generality; that is, it is 
shown to be a basic behavioral phenomenon in the sense that it 
occurs under a wide variety of circumstances. Finally, such a 
generalized phenomenon, along with its relationships to stimulus 
variables, may become a principle in an explanatory system whereby, 
along with other principles (empirical or postulated), deductions of 
other phenomena are possible. The fiist two outcomes will be ap- 
parent in the illustrations of the present section, the third outcome 
will be illustrated in the following section. Our need at the present, 
then, is to look at illustrations of the scientist’s activities which 
result in what I am calling empirical explanation. 

1. In the “pure” case of operational identification note is taken of 
the fact that a set of operations used to define one phenomenon are 
identical or nearly so with a set used to define an already established 
phenomenon. It is then simply asserted that we are dealing with a 
single phenomenon. If the comparability of operations is apparent so 
that the community of scientists will accept the identification, 
operational reduction is accomplished. Thus, when Zeller {2$) noted 
that his operations, set up to study repression, were not critically 
different from those used to study motvoation, he concluded that he 
could not proclaim the experimental isolation of a new phenomenon. 
Q£ CQUCse, wbecL £05 dlftwe-wi siwAlat ‘iKat 

they can be said to be studying the same phenomenon by fiat or 
acclamation, this is usually noted before the researches are carried 
out. That is, it is noted that the critical variable used in defining a 
phenomenon is also the critical variable in the research about to be 
done, and the identification is accomplished. The illustration given 
in the previous chapter concerning the number of particular ways 
in which blocking of goal-directed responses may occur to meet the 
definition of frustration is sufficient and we need not dwell on this 
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separate. So also, if we find that incidental learning and R-S learning 
do not respond to variables in the same manner, these also must be 
kept separate. On the other hand, if behavior does change in much 
the same way when the same variables are manipulated in the wo 
situations, I think we would arrive at a conclusion that we are 
essentially dealing with the same processes and that the difference in 
the two operations is behaviorally irrelevant. Thus, by such proce- 
dures, we may avoid talk of two concepts where one is sufficient. 

Let us be sure we understand the implications of operational 
identification as a means of empirical explanation, whether this 
operational reduction occurs by fiat or as a result of the sort of 
research described above in the case of incidental learning and R-S 
learning. The sole initial consequence of operational identification 
is to keep the number of independent behavioral phenomena to a 
minimum and in this sense only does it have explanatory value. For 
example, even if R-S learning is shown to be a simple manifestation 
of incidental learning, the latter having concept precedence, inci- 
dental learning per se is not explained. But, the pervasiveness or 
generality of incidental learning is increased so that it becomes 
apparent that when it is explained a rather large chunk of behavior 
will be included under the explanatory system. 

3. Operational identification docs not always follow the order 
of events indicated above; that is, it does not always occur by identi- 
fying a “new” phenomenon as being in fact produced by operations 
comparable to an already established phenomenon. There are in- 
stances in which one attempts to establish a new phenomenon and 
then show that it will account for (reproduces the operations of) 
an already established phenomenon. This peculiar reversal of proce- 
dure might seem contrary to my “law” of concept precedence. But, 
let us get an illustration before us and then evaluate the implications. 

A universal phenomenon of serial learning is the bowed serial- 
position curve. Items at the beginning and end of a list are learned 
rapidly, those in the middle most slowly. More specifically, the item 
just past the middle is learned most slowly so that the serial-position 
curve is nonsymmetrical— it is skewed. Explanations of this skewness 
have been attempted (e.g., ij) using postulated processes, but in one 
approach to the problem an attempt was made to account for it by 
showing it was simply the reflection of another empirical phe- 
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nomenon but one which had not yet been demonstrated { 21 ). The 
reasoning was about as follows. Tlie serial position curve makes it 
possible to say in a factual manner that in learning a serial list the 
subject acquires items in a forward direction and also in a backward 
direction. Thus, in a lo-item list, as learning takes place Item i will 
elicit Item 2 before Item 2 will elicit Item 3 and so on for forward 
learning. Item 9 will elicit Item 10 before 8 will elicit 9 and so on, in 
the backward direction. The forward and backward learning situa- 
tions are not operationally comparable. In forward learning a re- 
sponse becomes a scimlus for the next response whereas in backward 
learning a stimulus becomes a response for the preceding stimulus. 
Now, after making this analysis, the investigator noted that if back- 
ward learning took place more slowly than forward learning the 
skewness in the serial position curve would be simply a consequence 
of this fact. However, it would be necessary to show that backward 
learning did indeed take place more slowly than did forward learn- 
ing and this had to be demon^ated outside the serial-learning 
situation. For, if a difference in forward and backward learning is 
to be used to account for skewness, then the difference must be 
independent of a situation in which skewness would inevitably occur. 
'Oiis was tested and the results showed that backward learning did 
take place more slowly than did forward learning. 

Having established the difference in fonvard and backward learn- 
ing, what does the investigator say? Essentially what he can conclude 
is that the operations of serial learning allow forward and backward 
learning to take place; backward learning has been shown to be 
slower than forward learning. Therefore, the skewness in the serial 
position curve is explained at the empirical level (is operationally 
identified with) forward and backward learning. When the dif- 
ference in forward and backward learning is explained so also will 
be the skewness in the serial-position curve. 

But what about this business of concept precedence which I 
insisted upon when discussing operational definitions in Chapter 5? 
Doesn’t the present illustration deny concept precedence? Not 
basically, although I think even if it did we should allow for some 
flexibility in such arbitrarily trumped-up rules. The idea of concept 
precedence indicated that when two phenomena had been identified 
with essentially the same set of operations, precedence is given the 
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one which has greatest generality. Essentially this means that priority 
is given the phenomenon which will occur in a situation in which the 
other could not, by its conceptualized nature, possibly occur. Thus, 
in the present illustration, “skewness” could not reasonably be ex- 
pected to occur in a situation that is independent of serial learning. 
Skewness is tied to serial learning. But, the difference between for- 
ward and backward learning is not so tied; it occurs independently 
of the serial-learning situation, and therefore has precedence. Prece- 
dence in situations like this is not achieved by temporal priority of 
discovery but by generality of the phenomenon involved. I think 
you would agree that simply because skewness was discovered 
earlier than was the difference in forward and backward learning we 
should not insist on saying that skewness caused the difference in 
forward and backward learning nor even that differences in for\vard 
and backward learning are simply a manifestation of skewness. 

Although I shall not give any detail, I would like to call attention 
to the fact that explanations of spread of effect based on number 
biases (e.g., 16) is a case of operational identification in which the 
empirical demonstration of number biases followed spread of effect. 

Let us move along to other forms of empirical explanation which, 
though in the long run representing a form of operational identifica- 
tion, are somewhat more subtle than those we have examined thus 
far. 

4. We have seen that operational identification as a means of 
explanation ties directly into our previous material on operational 
definitions per se. And, since operational definitions are intimately 
related to basic matters of experimental design, we might expect that 
certain matters of design must inevitably arise when discussing forms 
of empirical explanation where this is achieved largely by opera- 
tional identification. One sometimes hears the criticism that we are 
somewhat overly concerned with details of experimental design in 
our quest for the purification of relationships and phenomena. \Ve 
are offered the other alternative of looking for general principles 
which will subsume the detailed findings; the general principles will 
supersede any of the minute purifications resulting from close at- 
tention to the details of our research operations. None would deny 
that the search for general principles is a primary goal of science. 
But I would venture an opinion that one possible way of attaining 
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am interested in the idea here that purifying designs may keep the 
number of independent phenomena to a minimum as well as exhibit 
the generality of certain phenomena.) There are other instances 
which also aid in arriving at the conclusion that secondary reward is 
a very widespread phenomenon. It has now been used essentially as 
an empirical postulate in certain explanatory systems (e.g., 24) from 
which, in conjunction with other postulates, other phenomena can 
be deduced. The generality of the phenomenon of secondary reward, 
while certainly confirmed by many direct tests, has also been ex- 
tended by perceptive investigators who have noted that procedures 
in certain experiments, designed to study different phenomena, 
were not pure in the sense that they did not exclude the operations 
used to demonstrate secondary reward. Generality is added by 
default. 

As a matter of fact, some illustrations of errors in design in 
Chapter s are really illustrations of empirical explanation. That is, 
the scientist simply notes chat operations presumed to demonstrate a 
new phenomenon (again let me remind you that I use phenomenon 
in a very general way to include even a simple empirical relation- 
ship) did not in effect do so because they did not eliminate the pos- 
sibility that the results represent an already established phenomenon 
that was allowed to occur. So we see that our designs must keep up 
with empirical knowledge; the more phenomena we define in an 
independent fashion, the “purer” must be our subsequent research if 
we expect to demonstrate unequivocally a new phenomenon. But 
certainly we must admit that there is a point where the subtlety is so 
great that to call the confounding an “error” in design is manifestly 
unfair. I would like to give an illustration where this seems to be the 
case. 

5. In 1959 (4) an experiment was reported that demonstrated 
what has come to be known as sensory preconditioning. In demon- 
strating this phenomenon two stimuli are paired together over and 
over. That is, a light and a bell might be presented simultaneously to 
a dog 200 to 300 times. Then, one of the stimuli is used as a con- 
ditioned stimulus in developing a conditioned response, say, leg 
udthdrawal. Then on test trials the other stimulus is presented. If 
foot withdrawal occurs with greater frequency than appropriate 
control frequencies, sensory preconditioning is defined. This phe- 
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nomenon has in general been reristant to attempts to incorporate it 
into certain learning theories. In 1951 other investigators (28) in 
studying the operations noted a certain basic similarity between them 
and other sets of operations used to establish secondary stimulus 
generalization (e.g., 22)^ this latter phenomenon having been demon- 
strated in a not too convincing manner but enough so to insure its 
acceptance as a reliable phenomenon. Now the investigators set up 
a situation in which a common response would be attached to nvo 
stimuli, as is the case in secondary stimulus generalization, but which 
could also be true for sensory preconditioning. But, in one condi- 
tion the stimuli were presented simultaneously as in sensory pre- 
conditioning and in another separately, as in secondary stimulus 
generalization. In the second situation, the usual conception of 
sensory preconditioning would not lead one to expect positive re- 
sults, i.e., that sensory preconditioning would occur. Yet, both 
situations would have the common operation of having the same 
response to both stimuli; this commonality was thought to be the 
critical part of the operations for both phenomena. If this is the case, 
both situations should show about the same frequency of response 
on test trials following the use of one as a conditioned stimulus and 
the ocher as the test stimulus, and both should be greater chan the 
controls. This is exactly what happened, and the investigators con- 
cluded that sensory preconditioning is just a special case of secondary 
stimulus generalization; in both instances the critical operation is the 
making of a common response to two different stimuli. I might 
mention that this is quite obviously a rather indirect form of opera- 
tional identification and has just a small assumptive component, 
namely, thac in the sensory peecondidoning situation a common 
response does occur to both stimuli. I doubt if anyone would 
seriously suggest that this a^mption is not warranted. 

6. Let me now turn to another case of operational identifica- 
tion in which a behavioral phenomenon may be reduced to reflect 
completely and faithfully the operation of a simple physiological 
principle. I say “may” because the reduction process still has to be 
carried out and may not succeed. But, our main concern is with the 
lopic of the intent. 
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but the one I wiW use by way of illustration is foveal contrast. Into 
one eye two small square fields of light are fed, one called the 
inducing field and the other the test field. The amount of contrast 
varies as a function of several factors. For our purposes consider the 
squares to be joined on one side. As the inducing field gets brighter 
than the test field, the test field appears to get dimmer. The amount 
by which it appears to get dimmer is determined by the subject 
adjusting the brightness of a square focused on the fovea of the 
other eye. The square, of course, is of the same size as each of the 
squares on the other fovea. The adjustable brightness square is 
changed until the subject indicates that its brightness is equal to the 
brightness of the test field in the other eye. 

The empirical explanation which might serve adequately for this 
phenomenon is scatter. Scatter is simply the name for the fact that 
the vitreous humour of the eye does not transmit light perfectly, 
slight imperfections allowing the light to scatter somewhat. Thus, if 
an image with perfectly sharp contours is projected into the eye, 
because of scatter it will not arrive on the fovea with the same such 
sharpness. That such scattering lakes place is a well-established fact. 
It is believed that brightness contrast may be entirely due to this 
scatter. To try to effect such an operational identification from a 
behavioral phenomenon to a physiological phenomenon, two steps 
are planned. First, the variables which affect brightness scatter will 
be manipulated again but this time scatter will be measured directly. 
If the variables should affect scatter in the same way they affect 
brightness contrast, and if the entire amount of brightness contrast 
can be shown to be isomorphic with the scatter, it could be con- 
cluded that the visual system beyond the fovea is transmitting per- 
fectly that which falls upon it. Secondly, one could separate out 
subjects who have little scatter and those who have a great deal; 
two such groups should differ comparably on brightness contrast 
tests. Again, let me say that whether this will be successful or not is 
not the issue in our using the illustration here. It is the intent of the 
investigator that is important for the present discussion. Actually, it 
would be a rare instance if a behavioral phenomenon could be 
shown to be completely isomorphic to a physiological phenomenon. 
Usually, discrepancies develop when it has been attempted to draw 
parallels between the two levels of description (as we shall see 
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but the one 1 will use by way of illustration is fovcal contrast. Into 
one eye two small square fields of light arc fed, one called the 
inducing field and the other the test field. The amount of contrast 
varies as a function of several factors. For our purposes consider the 
squares to be joined on one side. As the inducing field gets brighter 
than the test field, the test field appears to get dimmer. The amount 
by which it appears to get dimmer is determined by the subject 
adjusting the brightness of a square focused on the fovea of the 
other eye. The square, of course, is of the same size as each of the 
squares on the other fovea. The adjustable brightness square is 
changed until the subject indicates that its brightness is equal to the 
brightness of the test field in the other eye. 

The empirical explanation which might serve adequately for this 
phenomenon is scatter. Scatter is simply the name for the fact that 
the vitreous humour of the eye docs not transmit light perfectly, 
slight imperfections allowing the light to scatter somewhat. Thus, if 
an image with perfectly sharp contours is projected into the eye, 
because of scatter it will not arrive on the fovea with the same such 
sharpness. That such scattering takes place is a well-established fact. 
It is believed that brightness conci^sc may be entirely due to this 
scatter. To try to effect such an operational identification from a 
behavioral phenomenon to a physiological phenomenon, nvo steps 
are planned. First, the variables which affect brightness scatter will 
be manipulated again but this time scatter will be measured directly. 
If the variables should affect scatter in the same way they affect 
brightness contrast, and if the entire amount of brightness contrast 
can be shown to be isomorphic with the scatter, it could be con- 
cluded that the visual system beyond the fovea is transmitting per- 
fectly that which falls upon it. Secondly, one could separate out 
subjects who have little scatter and those who have a great deal; 
two such groups should differ comparably on brightness contrast 
tests. Again, let me say that whether this will be successful or not is 
not the issue in our using the illustration here. It is the intent of the 
investigator that is important for the present discussion. Actually, it 
would be a rare instance if a behavioral phenomenon could be 
shown to be completely isomorphic to a physiological phenomenon. 
Usually, discrepancies develop when it has been attempted to draw 
parallels between the two levels of description (as we shall see 
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properties of sensory organization and were relatively unsullied by 
learning. In such conflicts, of course, the rigor of research must 
be substituted for the rancor of words. Empirical phenomena which 
may at first hold promise of great generality in the investigators’ 
conceptual speculation may eventually be cut back empirically so 
that the phenomenon will be shown to occur only under a highly 
restricted set of operations. For example, in my own area of research, 
distributed practice was thought to be superior to massed practice 
for learning verbal material of many kinds presented in many ways. 
Research shows now that this is simply not the case; the phenomenon 
can be produced only under a highly specific set of conditions. The 
history of science shows many phenomena which in the initial stages 
of work upon them gave little indication or promise of achieving the 
great generality which they later attained. 

It seems to me that in the stage of science where there is emphasis 
on empirical growth (such as I judge psychology to be in at the 
present time) we may expect many of these attempts at empirical- 
extension to break out. The scientist seeks for general laws in at- 
tempts to avoid being faced continually with pesky isolated sets of 
data. If, through operational identification, these facts can be shown 
to be manifestations of a few basic phenomena the science advances 
rapidly. One can note these tentative probings in several areas in our 
literature at the present time. The recent emphasis on motivational 
factors in perception is one illustration. As another, one of my col- 
leagues (J) systematically compared the influence of certain vari- 
ables on some motor learning phenomena and the influence of these 
same variables on certain perceptual phenomena. The striking com- 
parability has led him to suggest that we are not dealing with dis- 
parate functions in the nvo fields. Hclson’s adaptation level (75), 
which is basically an empirical phenomenon, has been demonstrated 
in several judgmental situations and it is tentatively suggested that it 
will occur in a much wider range of situations. Berg (5) has sug- 
gested that response sets, shown in a number of situations, may also 
operate in personality and interest inventories and that the’ scores 01' 
these tests may reflect largely differential response sets which ar' 
independent of the content of the test item. 1 suspect that when th^ 
implication of this anempt at empirical extension is fully rcalir'**- 
we may expect words to fly. 
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So we see that throughout our science, attempts at operational 
identification are being carried on constantly. Their consequence is 
to keep the number of independent phenomena down to a minimum 
and to determine the degree of generality of these phenomena. They 
result from what may itself be a fundamental principle of behavior, 
namely, to think of new things in terms of things about which we 
already know. 


MIXED EMPIRICAL-POSTULATIONAL EXPLANATION 

In empirical explanation as discussed in the previous section 
neither postulated processes nor hypothetical properties were in- 
volved. If there is a hypothetical component involved in empirical 
explanation it is usually no more than a working hypothesis that 
“this” phenomenon will occur as a result of this set of operations. 
Usually there is no deductive component directly involved in the 
operational identification. I have tried to show that operational iden- 
tification is a basic scientific activity, not mere pedantry, and that 
these simple working hypotheses that often guide operational reduc- 
tion may have startling implicarions when they trespass on sup- 
posedly well-conccprualized areas of behavior. In the present section, 
while we fully realize the futility of trying to hold to clear-cut 
categories in material of this sort, we propose to proceed to a dis- 
cussion of explanatory attempts which, like operational reduction, 
start with the working hypothesis which directs operational iden- 
tification. But, in addition, since simple operational identification 
will not incorporate the phenomena under considerarion, postula- 
tional steps are necessary. 
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properties of sensory organization and were relatively unsullied by 
learning. In such conflicts, of course, the rigor of research must 
be substituted for the rancor of words. Empirical phenomena^ which 
may at first hold promise of great generality in the investigators 
conceptual speculation may eventually be cut back empirically so 
that the phenomenon will be shown to occur only under a highly 
restricted set of operations. For example, in my own area of research, 
distributed practice was thought to be superior to massed practice 
for learning verbal material of many kinds presented in many ways. 
Research shows now that this is simply not the case; the phenomenon 
can be produced only under a highly specific sec of conditions. The 
history of science shows many phenomena which in the initial stages 
of work upon them gave little indication or promise of achieving the 
great generality which they later attained. 

It seems to me that in the stage of science where there is emphasis 
on empirical growth (such as I judge psychology to be in at the 
present time) we may expect many of these attempts at empirical- 
extension CO break out. The scientist seeks for general laws in at- 
tempts to avoid being faced continually with pesky isolated sets of 
data. If, through operational identification, these facts can be shown 
to be manifestations of a few basic phenomena the science advances 
rapidly. One can note these tentative probings in several areas in our 
literature at the present time. The recent emphasis on motivational 
factors in perception is one illustration. As another, one of my col- 
leagues (^) systemaucally compared the influence of certain vari- 
ables on some motor learning phenomena and the influence of these 
same variables on certain perceptual phenomena. The striking com- 
parability has led him to suggest that we are not dealing with dis- 
parate functions in the two fields. Kelson’s adaptation level {13), 
which is basically an empirical phenomenon, has been demonstrated 
in several judgmental situations and it is tentatively suggested that it 
will occur in a much wider range of situations. Berg (5) has sug- 
gested that response sets, shown in a number of situations, may also 
operate in personality and interest inventories and that the scores on 
these tests may reflect largely differential response sets which are 
independent of the content of the test item. I suspect that when the 
implication of this attempt at empirical extension is fully realized 
we may expect words to fly. 
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So we see that throughout our science, attempts at operational 
identification are being carried on constantly. Their consequence is 
to keep the number of independent phenomena down to a minimum 
and to determine the degree of generality of these phenomena. They 
result from what may itself be a fundamental principle of behavior, 
namely, to think of new things in terms of things about which we 
already know. 


MIXED EMPIRICAL-POSTULATIONAL EXPLANATION 

In empirical explanation as discussed in the previous section 
neither postulated processes nor hypothetical properties were in- 
volved. If there is a hypothetical component involved in empirical 
explanation it is usuaUy no more than a working hypothesis that 
“his” phenomenon will occur as a result of this set of operation. 
uLllv there is no deductive component directly involved in the 
ypTradonal identification. I have tried to show that oper«ional iden- 
tiLation is a basic scientific activity, not mere pedant^, and that 
Siesf imple working hypotheses that often guide operational reduc- 
tion C have staging implications when they trespass on sup- 

T tTo* But TSltiW« aii^Jb o^erad Jl' idendfication 

" inco^orate the phenomena under consideration, postula- 

donal steps arc 

I suspect that emp of "true” theory than did empuical explana- 
much more of the aura -„empts by cmpincal-postulationa! 

don. Furthermore “P Sgy and I will not be able to 

techniques are '"'*“P[^„^tionrf^quate^epresentadon in the space 
give this type of expla com- 

I feel I can allot “ ®^nations and even though I cn' comers 
pUcated than c™P.“‘“' take considerable space. We have 
in giving ill“^““"l'„^t«e broad areas of behavior into an 
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and postulation. For example. Personality and Psychotherapy by 
Miller and Dollard (7) assumes the reliability of a wide variety of 
clinical phenomena, then by a deft mbcture of operational identifica- 
tion and postulated processes and relationships attempts to account 
for these phenomena. 1 cannot, of course, review such a system 
and must stick to my policy of examining miniature explanatory 
attempts. 

I. In the general area of learning and retention, I suppose that 
no empirical phenomenon enters into more explanatory systems than 
does stimulus generalization. The use of this phenomenon in a 
miniature explanatory system may be illustrated by Spence's classic 
account of certain facts of discrimination learning in animals (23)' 
More specifically, it was an accounting of discrimination-learning 
facts which suggested that animals learn relations among stimuli, 
thus leading to what is known as transposition behavior. If an animal 
learns to approach a 5-untt stimulus and avoid a lo-unic stimulus, 
then when a lo-unic stimulus and a 15-unit stimulus are presented 
together, transposition is said to occur if the animal chooses the 
ro-unit stimulus. Such behavior led to the belief by some that the 
animal learned to “choose the smaller or weaker of two stimuli” so 
that when the ro- and 15-unit stimuli were presented together it 
chose the smaller. Spence’s approach was to show that this inter- 
pretation is not necessarily required. 

First, the phenomenon of stimulus generalization was assumed to 
be operating in such learning situations. Since stimulus generaliza- 
tion had been independently demonstrated in many learning situa- 
tions the assumption is hardly open to question. When an animal 
is rewarded for making an approach response to a stimulus (the 
correct response) a gradient of stimulus generalization can be shown 
empirically to exist. There was fragmentary evidence that a similar 
gradient might exist around the negative stimulus, which means that 
not only did the animal learn to avoid the particular stimulus which 
was not rewarded but also other stimuli similar to it, with the 
tendency to avoid being less and less as stimuli became less and less 
similar to the negative training stimulus. This is the entire empirical 
content of the theory. But in addition, certain characteristics 
were postulated for these phenomena, (a) Particular shapes for 
the positive and negative gradients of stimulus generalization 
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were postulated; (b) a postulated interaction that whenever 
the two gradients overlap the response tendencies summate alge- 
braically; (c) the animal will always respond to the stimulus with 
the greatest net positive habit strength With this Sp“ce 

could predict transposition behavior and could also pre 
situations transposition behavior would fail to . 

prediction would not be expected by one who used Ae «Ianona 
Approach; thus, supposedly, a test of two opposing conceptions was 

'’now what did Spence do in constructing this theory? He took 
an empirical phenomenon, srimulus 

call a Level-a^oncept. But, in assigning the hypo htfcal proper^ 

ties (particular shapes to the -whether 

action) he is using what I have called 1'.'=''"' or 

we should now call stimulus “^ro^ove^ 

not, or whether one even wants to think >" f iendst’s 

tively unimportant as long as we see ^ 

thinking which is involved. I have ^ j^ot. I would also 

with static concepts; this is an illustr .hat explanatory 

like to note again, as 1 did in the mo« 

attempts involving postulated proc^es J , 
processes and a postulated theorizing leads him 

z. We have already seen how Hebbs (ra) theoriz^^^^ 

to the neurophysiological leve . It o^tophysiological 

whether the scientist lo^'ts abstract level, the funda- 

level or whether he ^^.as's d- "ot differ. Re- 
mental characterisocs of ^ ices may differ in that the 

search resulting from the differe p about the nature 

neurophysiological reductionist is ni g attempt to dc- 
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termine independently the ™l‘d.y of these take 

physiological research. But, nriyinp^ and hope we won’t be 

a look at a fragment of Hebbs * ' lostrate the empirical-posmla- 
doing him a serious injustice as w primary be- 

tional approach. For trying to explain in this system 

havioral phenomenon which H ^ „ame for his essential 

within a system is memory, lb What he tries to do is 

explanatory mechanism is the cett assembly 
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explain how memories are established. To do this he believes he 
must have a mechanism for a temporary reverberation of the neural 
trace resulting from a perception. Also, there must be some growth 
change during the reverberation which effects a permanent modi- 
fication (memory). What does he do to accomplish this? First, he 
gathers together all the neurophysiological facts he can which might 
be relevant, i.e., the fact that neurons have certain structures, that 
there are synaptic junctions between cells, that one cell may fire 
another, and so on. He also shows that evidence is fairly clear on 
the fact that there is a reverberatory trace occurring among simple 
neuron circuits under certain conditions. But, also looking at the 
facts on refractory phase, and the fact that a simple circuit of 
neurons could not possibly reverberate long enough to establish per- 
manent change, he makes certain assumptions or postulates. The 
essential postulate is that if neuron circuits converged in a certain 
way, and if alternative pathways of reverberation were possible as 
a consequence, permanent change could be set up to account for 
memory and certain allied phenomena. This is a simplified version 
and perhaps not exactly accurate, but it is a close enough approxi- 
mation to show the nature of Hebb’s merging of neurophysiological 
fact and his ficdon (postulated characteristics) about them to ac- 
count for certain behavioral phenomena. Possibly it is not quite 
accurate to say that this accounts for the behavioral phenomena; it 
is perhaps more accurate to say that the proposed neurophysiological 

events underlie the behavioral phenomena. 

Considering Hebb’s and Spence’s illustrations of theorizing to- 
gether, I think there are three points that I would like to make 
partly by way of reminders of previous discussion. First, why did 
why did Hebb postulate the particular processes they 
did. The reason is, as we have pointed out in the previous chapter, 
that postulated characteristics are assigned in such a way as to ac 
count for the behavioral phenomena. In this sense the reasoning 
reflects the inductive-deductive circles I have talked about earUer. 
We must not aUow ourselves to believe that in the initial formula- 
tion of such an explanatory attempt k is anything but an ad-hoc 
formulation; it is built to accommodate certain facts and therefore 
must accommodate them. Only when it is shown that it will incor- 
porate facts not used in its construction, and which are operationally 
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independent from those so used, does the explanation lose its “ad 
hocness’^ and become a predictive system. And again, I must insist 
that to say an explanation is ad hoc is not to damn either it or the 
scientific activity which led to it. For, to find the proper combina- 
tion of postulated characteristics (in these cases used in conjuncdon 
with certain empirical relationships) that will generate the known 
facts is not an easy job as long as the scientist adheres to the prin- 
ciple that the number of postulated processes must be appreciably 
fewer than the number of operationally independent phenomena for 
which these processes are to account. 

Secondly, it might seem at first in looking at Spence’s approach 
versus Hebb’s approach that Hebb’s postulational attempts are much 
more circumscribed by the situation than is Spence’s. That is, it 
might be thought that Hebb must work at the postulational level 
within a very narrow range of possibilities because he is so re- 
stricted by neurophysiological facts against which his postulates 
must not run contrary. If this be true, however, it must be clear 
that this is not a consequence of rcductionism versus nonreduc- 
tionism. In the particular kind of explanation which I am consider- 
iiig (combined empirical and postulational) the scientist is bound 
only by the known empirical phenomena (and their relationships 
with known variables) which are used as the empirical component 
of the explanatory attempt. Tlius, while Spence postulated the spe- 
cific form of the generalization gradient, it did not contradict the 
essential but general fact that the gradient falls as some function of 
similarity to the training stimulus. Likewise, Hebb did not postu- 
late any characteristics which contradicted any known character- 
istics of the nervous impulse. Thus, how much one may be initially 
restricted in his explanatory attempts depends upon the amount of 
empirical content he starts with. Wc might have instances in %vhich 
the empirical content was low, hence Dttic restriction is imposed 
on the nature of the postulated processes. On the other hand, if we 
bring a great deal of empirical content as a base for an explanatory 
system we may (but do not necessarily) restrict the diversity of the 
postulational attempts. One might contrast Spence’s theory of dis- 
crimination learning where the empirical content is low and postu- 
lated content is high, svith his theory of delay of reward learning 
where the empirical content is relatively^ high and the postulated 
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relatively low (24 ) . I do not know, however, whether or not these 
two parts of an explanatory system are always reciprocals; the 
only point I wish to make is that it does not seem to me that 
the reductionist is any more restricted in his fantasies than is 
the nonreductionist. Any differences in restriction that occur in 
empirical-postulational explanation probably result from differ- 
ences in extensiveness of empirical phenomena and laws about them 
that are put into the explanatory system in the first place. 

Third, I should note briefly how the empirical-postulational ex- 
planatory attempts may lead to two different ways of verification. 
First, we may in many cases test directly the postulated character- 
istics assigned the empirical phenomena of the system. Thus, Spence 
postulated a specific form of the generalization gradient and one 
could test this directly. If such a test showed that the postulated 
shape does not exist in fact then the system would have to be 
altered to encompass this necessary change. So also may neuro- 
physiological evidence confirm or deny some of Hebb’s postulated 
processes. Secondly, one may empirically test deductions which stem 
from the system, and I think we would agree that this is less direct 
(although no more or less valid or critical) than the direct tests of 
assumptions. Either method leads to clarification and evaluation of 
explanatory attempts. Indeed, if in the neurophysiological form of 
theorizing all assumptions are tested and confirmed then the explana- 
tory system becomes no more than an elaborate form of strict em- 
pirical explanation arrived at in a different manner than was dis- 
cussed for such forms of explanation. However, the same degree of 
certainty can be attained by positive tests of postulated character- 
istics at the abstract level although I suspect that for equal degrees 
of certainty a greater number of tes^ is required when dealing with 
abstract processes. But let me now turn back to further illustrations 
of empirical-postulational explanation. 

3. As 1 have suggested earlier, explanatory attempts dealing with 
behavior associated with sensory processes have rather consistently 
turned to neurophpiological mechanisms. Most of these explana- 
tions must necessarily take on the combined empirical-postulational 
approach. The empirical content usually refers to neurophysiological 
(or chemical, or electrical, or mechanical) facts and the postulational 
procedures are used to fill in the gaps not covered by these facts. 
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phenomena in the area of rote learning. Subsequent work by Hull 
uses both empirical and postulated laws in the systems. For our pur- 
poses here, we will examine a miniature system dealing with a rela- 
tively restricted area of animal behavior. 

In the area of animal learning there is a well-established phenom- 
enon of alternation behavior. This behavior is most clearly seen in a 
simple T-maze. If food is placed in both arms of the maze and the 
animal given a series of massed trials, it will tend to alternate be- 
tween the two arms on successive trials. Several variables influence 
the extent or amount of alternation. Different theoretical account- 
ings have been offered for these facts; the one I wish to discuss is a 
near pure case of postulational explanation. This theory, developed 
by Glanzer (//), is presented as one postulate with several parts. 
However, I shall outline it here in different form and omit certain 
aspects which are not necessary for our purpose. 

Postulate i. Each moment an organism perceives a stimulus-object 
there develops a quantity of stimulus satiation to the object. Stimulus 
1113^0?^ organism’s tendency to make any response to 

Postulate 2. The same amount of stimulus satiation to the object 
eve ops in each successive moment. The total amount developed, 
therefore, is an increasing linear function of time. There is a loss of 
p^ o each quantity of stimulus satiation in each successive moment. 
1 he amount of stimulus satiation remaining from each quantity is 
a ^creMing negative exponential function of time. 

orftt ate 5. Stimulus satiation developed to an obiect will be gen- 
of"*^ ob]°e(S^^^ stimulus objects as a direct function of similarity 

PosUilate 4. Various quantities of stimulus satiation combine 
additively. 

_ Now, let us see what we have. We are trying to give an account- 
ing of altemadon phenomena, these phenomena usually being 
defined as Level-a concepts. The single hypothetical procei which 
ismvolved is ^tvmlus satiation; this is clearly a Level.4 concept. 
The charactensucs assigned stimulus satiation by postulation arc 
those charactenstics needed to account for the fact of alternation 
behavior and the influence of certain stimuius variabies on its 
amount. Postuiatc i as stated assumes that an organism may devciop 
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for it. It can also be seen that as time between trials increases, alter- 
nation behavior will decrease, for the satiation will dissipate in the 
interval between trials. A number of other established facts also 
follow. Then, Glanzer lists a number of predictions which stem 
from the theory which have not as yet been tested. He further ex- 
plores the implications of the postulates for situations with more than 
two alternatives. And he also suggests how other phenomena, not 
immediately seen to be related to alternation behavior, may be de- 
duced from the system. For example, exploratory behavior, usually 
thought of as a special kind of drive, may be deduced from the 
postulates. Finally, he makes tentative suggestions concerning cer- 
tain human behavior which might be subsumed under such a set 
of postulates. 

Now, whether one likes the approach Glanzer has used or not, 
and whether or not there are certain vaguenesses (e.g., how do we 
tell when perceiving is occurring?), I think we must admit that 
Glanzer has in general taken into account certain “rules” of theory 
construction which are often suggested. First, he has relatively few 
assumptions; he has shown how these assumptions may account for 
a number of facts about alternation behavior; he has indicated how 
implications of the theory may be tested by performing new sets of 
operations, and he explores the implications of the theory far beyond 
the facts associated with simple alternation behavior. Whether it is 
as satisfactory as other attempts in accounting for the same behavior 
is up to the experts in the particular field to decide. 

I shall give no further illustrations of the strict postulational 
approach. Our discussion of Level-4 concepts gave us considerable 
by way of illustrating the thinking which goes into these kinds of 
explanatory systems and furthermore, there is no essential difference 
between the combined empirical-postulational approach and the 
postulational once the postulates arc brought together. It should be 
noted, however, that in the sense of making direct tests of postu- 
lates, as we can when an empirical postulate is involved, it is not 
often possible when the approach is purely postulational. Many tests 
must be made in terms of implications of the interaction of the 
processes and their characteristics. 

Let me review what I have done thus far in this chapter. First I 
presented illustrations of what I have called empirical explanation, 
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from simple operational identification to more subtle 
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discussed we note this fundamental theme. Operational identifica- 
tion (empirical explanation) is clearly this sort of thinking; and, 
when postulated processes enter into explanatory systems the char- 
acteristics of these processes almost inevitably are similar to some- 
thing about which we already know (ao). It seems intellectually 
compatible, almost to the point of necessity, that we think of new 
phenomena in terms of events and relationships and characteristics 
about which we already know. Yet the seeming inevitableness of 
this mode of thought should not make us complacent, for without 
doubt many of the great strokes of explanatory genius have come 
about because the scientist did break through this intellectual re- 
striction and allowed his imagination to focus on implications of rela- 
tionships foreign to those about which he had previously thought. 

^ For our purposes, the term “model” may be said to be introduced 
in two different contexts. Whether or not these two contexts are 
completely distinct is of little consequence; at least an examination 
o t ese contexts will give us some insight into activities of research 
psychologists which we have not hitherto discussed fully. 

Research models. Research in a relatively new area of investiga- 
lon is se^ om undertaken without some conceptual scheme in mind, 
ihat IS, It IS seldom undertaken without some preconception as to 
e namre 0 t e phenomena and perhaps the processes lying behind 
them. These predilections are usually lightly held but they do afford 
the initial working hypotheses, i.e., what variables to investigate 
initially. If one studied the personal history of the particular scien- 
ist involved one could probably determine the source of these 
orienting atumdes, as they have been recently called (y). Let me 
give you an illustranon of what I mean by these research models. In 
^ sniH been a rather marked growth in interest 

^ ff processes experimentally. Several investi- 

g or have offered research models which the/believe might be 
useful m the initial attacks on the area. Thus, Bartlett (^) views 
thmkmg as having a counterpart in motor learning, and the research 
he would undenake would be directed initiaUy by this conception. 
Kendler 0?) believes certain phenomena of simple conditioning 
will be evident in problem-solving behavior. From Lch conceptions 
certain variables are suggested as being important; hence the initial 
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inculcate the new area of research under the old and if an explana- 
tory system has been developed for the old it will serve for the new. 
I think it is evident that when such identification is found to be even 
partially complete, we are dealing with a complex case of empirical 
explanation as I discussed it earlier in the chapter. For, the result 
is to keep the number of independent phenomena to a minimum by 
operational identification. So, the research model, introduced origi- 
nally as a device for getting research initiated, may result in em- 
pmcal explanation. ^ 

Let me turn to another possible outcome of research models. 
Let us assume that the model is a statistical, mathematical, or 
mechanical one. If research shows that behavior corresponds to the 
statistical, mathematical, or mechanical laws, the investigator may 
now begin to think of and use the model as an explanatory system. 
The statistical (or mathematical or mechanical) laws may be used 
to deduce additional behavioral laws. Thus, if certain basic relation- 
fo " if ‘ '^®P"“ses in the area of learning were 

bv^side °*"/=!«‘“'>s'>>ps may be predicted for behavior 
tac sXm? xfl additional relationships which hold for comput- 
ing systems. This procedure has not been very successful as yet 
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plays in such formulations can be found in several sources (c.g., 

Sj ^5)‘ 

I have said that research models may be mechanical (or electrical, 
or other) models. So, also, may such models be introduced as ex- 
planatory models. That is, they are introduced not because the 
scientist wants to use them as analogies from which to develop re- 
search problems, but because he believes they will explain the 
behavior involved. This may take two distinguishable turns. Sup- 
pose an electronic computer is used as an explanatory system for 
learning and retention phenomena. As discussed earlier, if the laws 
for behavior and for the computer ate commensurate, then the 
computer laws may be thought of as explaining the behavioral laws 
in the same sense that a mathematical system is said or might be said 
to explain behavior. Under such an orientation, one would not neces- 
sarily look (by research) for the neurophysiological counterparts of 
the computer; the laws for the computer and for the organism are 
postulated to be isomorphic and no inquiry is made as to how this 
comes about. In the same sense a mathematical model is postulated 
to be isomorphic with an area of behavior but it is meaningless to 
inquire as to how the mathematical system got that way. 

On the other hand, the scientist may use a mechanical model as an 
explanatory model and then set about to find neurophysiological 
counterparts to the elements of the mechanical system. Actually, as 
we have discussed earlier this is a form of empirical explanation by 
identification, and occurs quite frequently in miniature form where 
psychology and neurophysiology converge, most notable in the areas 
of sensory processes and brain functions. For example, Kohler and 
Wallach (/ 5 ) postulated a model for the visual cortex. This model 
was one of a particular kind of electrical field well understood by 
physicists. What these investigators said, in effect, was that if the 
cortex was such an electrical field then certain behavioral phenomena 
were understandable. That this model was not being used merely as 
a research model or as a formal model (as would be the case with 
a mathematical system) is shown by the fact that much effort of 
these investigators has been directed toward showing that such 
electrical fields do exist in the cortex. In short, it is thought of as a 
reductive explanatory model for behavioral phenomena. The model 
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we set do^vn some rules for “good” and “bad” explanatory' cfTorts 
which in no way hamper our conceptual imagination but will pro- 
vide ground rules of some kind? It seems to me that the areas of psy- 
chology differ so in their empirical development that such work- 
ing rules would be anachronistic in some areas and anticipatory in 
others. In the area of sensory precedes almost any piece of research 
that is well done has relevance for explanatory attempts already put 
forward. In the area of clinical behavior we are so immersed in try- 
ing to establish reliable phenomena that explanatory efforts pretend- 
ing any scope would be difficult to assess. In the area of learning, 
which lies somewhere between clinical and sensory processes in 
terms of empirical development, evaluation of explanatory efforts 
that lay claim to some scope have only recently been given sys- 
tematic and comprehensive attention. An outline has been offered 
for evaluating such theories (p, p. xui-xiv); some of the points in the 
outline have been discussed here but the entire outline deserves study 
by those who might be interested in the rather formidable under- 
taking of assessing systematically an explanatory system of some 
scope. Nevertheless, even when faced with my o\vn argument 
against it, I have the temerity to make some comments and sug- 
gestions about explanatory procedures. Some of these, I feel sure, 
no one can disagree with; others might be decidedly controversial. 

I. When reporting research I would insist that we have an obliga- 
tion to place the research in some sort of context reflecting pre- 
vious work. This context may take cither of two forms. It may be 
a strictly empirical context in which the investigator makes an evalu- 
ation of just where the study fits, e.g., what gap is being filled, what 
empirical contradiction is trying to be resolved, and so on. This 
setting in the empirical context I judge to be obligatory. I have very 
little patience with research which is reported without reference to 
any other findings or other phenomena at the empirical level. 

The second context in which an experiment may be presented 
(and this should be in addition to the empirical context) is an ex- 
planatory context. Many experiments are done for the purpose of 
testing hypotheses derived from some explanatory attempt. There 
are certain dangers involved in these experiments. 

(if) The test may not lie within the boundaries specified by the 
explanatory system. A system developed to account for certain 
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generalizations; he is trying to bring phenomena and their rela- 
tionships together in some meaningful fashion. But, just how does 
one go about this? Suppose that you have a set of reliable data in a 
given area and you want to bring this into some sort of explanatory 
order. How docs one do this? I am in no position to tell how to 
theorize as far as the development of imagination, broad empirical 
perspicuity, and so on, are involved. However, in looking at the 
development of various explanatory attempts in psychology, I may 
suggest some alternative approaches with priorities, and some ideas 
on what to do irrespective of the nature of the specific explanatory 
attempt one chooses. 

(a) I probably need not say, but shall anyhow, that we should 
carefully delineate the phenomena or relationships with which we 
wish to deal in our explanatory attempt. The phenomena should be 
operationally defined and their relationships with stimulus variables 
stated insofar as these arc known. If the research has been systematic 
and precise enough, these relationships may be quantitatively 
stated. 

(,b) As a next step I think the search for empirical explanation 
has very high priority. This is most likely to be fruitful if one is 
dealing with data from a relatively new area of research. One asks 
whether or not the phenomena at hand can be manifestations of 
already established phenomena. This requires careful study of the 
operations but sometimes rimple operational identification as an 
explanatory device is so obvious that it might be overlooked. It is my 
personal belief that no greater service can be rendered our science 
than by persistent attempts at empirical explanation. It may take 
additional research to establish the commonality of operations but 
this is true of any type of explanatory attempts. Now of course, if 
one accomplishes empirical explanation one may wish to pursue 
the matter further and offer “higher” forms of explanation (e.g., 
postulational) to encompass the “old” and “new” phenomena if an 
adequate system is not available for the old phenomena. This is a 
matter to which I will turn in a moment. What I would caution is 
that we don’t jump to these higher forms of explanation without 
first considering carefully the possibility of empirical explanation. 
There are several illustrations in the literature of our science where 
a studious inspection of the operations defining a phenomenon would 
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the system. I suppose we should stick pretty close to this tenet, but 
I am a little resistant to it. A theory with postulated physiological 
processes might be untcstable at the moment because of the state of 
technological development. That is, the instruments or other tech- 
niques needed to test the theory may not be available at the present 
time but five years from now they may have been developed. Even 
at the psychological concept level, one need not assume that a 
theorist is so infinitely wise that he alone is the one to determine 
whether or not his system allows for independent tests. Others might 
see how tests could be made. Nevertheless, I suppose we must con- 
tinue to view not only the fact of whether or not a theory is test- 
able but also the ease with which tests arc generated, as prime criteria 
of theory evaluation. 

3. It not infrequently comes to my attention that graduate stu- 
dents often try to establish their personal philosophies of explana- 
tion in psychology by asserting they are “for”’ or “against” theory. 
I wish the issues could be so simply resolved, but I think it is clear 
that they cannot. I think it is perfectly reasonable to expect to find 
certain explanatory methods, e.g., pure postulational, which are 
incompatible with one’s mode of thinking. But, to say that one is 
against theory is not consonant with being a scientist. For although 
we couldn t arrive at any acceptable specific use of the word 
theory it nevertheless always implies an attempt to bring order to 
t e world of empirical facts by abstracting out the commonalities 
underlying the facts. All this means is that we are searching for 
^neralizations and this is science. When one asserts he is against 
theory it usually means he is against a particular way of approach- 
ing the search for generalizations, and that is the most it can mean 
if one is to remain a scientist. 
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the system. I suppose we should stick pretty close to this tenet, but 
I am a little resistant to it. A theory with postulated physiological 
processes might be untestable at the moment because of the state of 
technological development. That is, the instruments or other tech- 
niques needed to test the theory may not be available at the present 
time but five years from now they may have been developed. Even 
at the psychological concept level, one need not assume that a 
theorist is so infinitely wise that he alone is the one to determine 
whether or not his system allows for independent tests. Others might 
see how tests could be made. Nevertheless, I suppose we must con- 
tinue to view not only the fact of whether or not a theory is test- 
able but also the ease with which tests are generated, as prime criteria 
of theory evaluation. 

3. It not infrequently comes to my attention that graduate stu- 
dents often try to establish their personal philosophies of explana- 
tion in psychology by asserting they are “for”’ or “against” theory. 
I wish the issues could be so simply resolved, but I think it is clear 
that they cannot. I think it is perfectly reasonable to expect to find 
certain explanatory methods, e.g., pure postulational, which are 
incompatible with one’s mode of thinking. But, to say that one is 
against theory is not consonant with being a scientist. For although 
we couldn’t arrive at any acceptable specific use of the word 
“theory” it nevertheless always implies an attempt to bring order to 
the world of empirical facts by abstracting out the commonalities 
underlying the facts. All this means is that we are searching for 
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where the latter may have a very useful purpose and this purpose is 
so recognized by the investigator. Let me illustrate what I mean by 
nonanalytical research and then discuss this matter of utilization of 
such research results. 

In the undergraduate course in experimental psychology which 
I teach I try to do at least one experiment which has features indi- 
cated by the following conditions. A simple task is used, such as 
cancellation, digit-symbol substitution, or reversed-alphabet print- 
ing. Two groups are matched on performance on the task chosen. 
Then the experimental group, working in isolation from the control 
group, is given a series of trials on this task. During these trials the 
subjects are seated in chairs placed in a circle so that each subject 
can see all other subjects. After each trial each subject counts the 
number of correct responses he made during the trial, e.g., the num- 
ber of letters printed during a one-minute interval. When all have 
determined this value, the experimenter starts around the circle ask- 
ing each subject to indicate clearly to all others how many he got 
correct and then how many he is going to “try for” on the next trial 
(level of aspiration). Then, another trial is given and after the num- 
ber of correct responses is determined by each subject each in turn 
is asked to make known to all other subjects how many he said 
he was going to try for, how many he actually attained, and how 
many he is going to try for on the next trial. This procedure con- 
tinues for several trials. 

The control group, on the other hand, is simply given the equiva- 
lent number of trials in a formal situation. The subjects in this group 
are not allowed to count the number correct on each trial, are not 
allowed social interaction between trials, are seated in a formal 
classroom fashion, and so on. Comparison of performance on the 
series of trials usually shows the experimental group is superior; even 
these relatively sophisticated subjects usually respond to the con- 
ditions set up for the experimental group. But what do we have? 
We have a horribly nonanalytical experiment. Certainly the differ- 
ence in performance can be attributed to the difference in treat- 
ments of the two groups. But, even at our rather retarded stage of 
knowledge in the area of human motivation we can identify a 
number of factors (operating in the experimental conditions and not 
in the control) any one of which could conceivably produce the 
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be operative in a single experiment. If no difference in behavior 
occurs as a consequence of the manipulation of this complex stimulus 
situation the investigator has in a single experiment eliminated three 
or four (or as many as he can identify) variables as being relevant 
for the behavior being studied. There is some danger, of course, that 
say, two of the variables might have opposite effects and thus 
cancel, but usually a careful internal analysis of the data can detect 
such a possibility. If, on the other hand, he does find a difference in 
behavior, he knows the critical variable is among the several in- 
volved, or he knows that there is a combination which influences 
behavior, and he can proceed with analytical research to isolate the 
factor or factors involved. 

The above procedures are not frequently used intentionally in 
psychological research; perhaps not as frequently as they should be. 
I think such research might be quite efficient under circumstances 
either where the investigator judges the variables not to be relevant 
but wants this “on the bool«” or where he has reason to believe 
that there might be an important variable among several possible 
ones. And I might add that as I shall point out more specifically 
later, there are many instances in our science where an investigator 
prejudged a variable as not relevant only to find that it was highly 
relevant. 

I said above that it doesn’t seem to me that such procedures are 
used very frequently in psychological research. The investigator 
usually doesnt indicate whether or not he realizes he has several 
possible unitary factors involved in the stimulus complex. Clearly, 
if we are using this method as a means of exploring several variables 
we should indicate this in order to avoid being criticized for lumping 
all these factors together. The elimination of variables as relevant 
factors for behavior is, of course, a very worthwhile scientific 
enterprise since it is an integral part of analytical progress. Vi^hile 
there seems to be no a prion reason why we should ape the older 
sciences, it is my understanding that this shotgun approach for 
demonstrating the irrelevancy of factors is used in these sciences 
and we might well study further the implications of such emulation. 
I am going to return to this matter very briefly in other contexts in 
this chapter. 
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by including them as separate terms in the analysis. Thus, such 
analyses may achieve the same end as I have suggested would be 
achieved by throwing several variables into a stimulus complex. 
As a matter of fact, it is quite clear that considered use of analysis 
of variance can achieve this much more efficiently. But, while it is 
trite to say so, one gets out of an analysis of variance only an 
evaluation of those variables which are put into it and what one 
puts into it depends not only on statistical and design acumen but 
also psychological acumen concerning the range of factors which 
might influence the behavior. Furthermore, the subsidiary factors 
often inserted in analysis of variance arc inserted not because of 
an interest in their influence or lack of it but because a purer esti- 
mate of the error term for testing the major effects results from this 
insertion. The same factors so often become standard from experi- 
ment to experiment that the potentially powerful tool is not given a 
chance to evaluate the influence of new variables that.-have been 
prejudged to be subsidiary. 

Regardless of how comprehensive an analysis of variance may be, 
I suspect that it is always possible to look at the findings from a 
different angle; it is always possible to slice, fractionate, or combine 
in new ways to obtain more information. And this is all I am urging; 
I think we should ask ourselves many, many questions about any 
set of data and see if these questions can be answered with the data 
at hand, thus avoiding having to do a new experiment. I am aware of 
the aversion that statisticians have for testing successive ad-hoc 
hypotheses from a set of data; but this fear is groundless if the 
investigator evaluates tests of these hypotheses with a judicious 
consideration for the statistical issues. 

2. I mentioned that certain standard analyses of variance patterns 
do provide tests of the influence of variables which the investigator 
would probably guess beforehand were probably not of much 
moment for the behavior being studied. It is simply the responsibility 
of the investigator to put these into the design. Without additional 
reference to analysis of variance, I would like to discuss further 
this matter of determining the effect of many variables. Almost any 
research offers the possibility of answering questions concerning the 
influence of certain variables for which the research was not spe- 
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portant and w'c probably would still remain oblivious to it if we 
had not established a habit of making many subsidiarj^ analyses of 
data in our laboratory’. I do not mean to imply that this finding had 
any earth-shaking consequences, but in my own very’ restricted area 
of research it allowed a lot of puzzling findings to fall in line. 

3. Sticking too close to analyses dictated by the problem for 
which the experiment was designed has other unsatisfactory conse- 
quences. There are a number of published reports in the literature 
which take the following pattern. A problem is stated, the results 
analyzed around this problem, and a theoretical interpretation of 
the results is offered. Tlicn, it is usually stated that subsequent re- 
search will have to demonstrate whether or not this theoretical inter- 
pretation is useful. But, if the invcstiga'tor studied his procedures and 
data carefully he would find that he already’ has data from the same 
experiment which suggested the explanation which w’ould test the 
theory. But not perceiving that this is possible, he will go ahead (or 
someone else will) and do a new experiment to test the hypothesis 
which could have been tested by analysing the data in a different 
way from the already completed study. 1 think this is wasteful of 
research energy. Of course I realize that support for the theory is 
of little consequence since it comes from the data which suggested 
it, but I am concerned with data gotten from subanalyses which are 
not in line with the theory. For, if such analyses had been made the 
interpretation actually made would not have been given in the 
report. 

Allied with this matter is another. As our thinking develops in a 
given area we may get explanatory ideas, or ideas for the importance 
of certain variables hitherto believed irrelevant, and we may have 
data available in the files to make at least a preliminary test. One of 
the cardinal sins in our laboratory is to throw away raw data once 
the major analysis and whatever subsidiary analyses we thought of 
at the time are completed. It may be three years, or ten years, or 
never that we arrive at a point in the development of an area where 
we ask questions that we judge important and which can be answered 
by subsidiary analysis of data already in our files. Many times these 
working hypotheses have turned out to be incorrect, but we have 
determined ^is without having to run a completely new experiment. 
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I. Suppose we wanted to find out the influence of a given variable 
on a given response. We may, at the very simplest level, want to 
know only whether or not it is a relevant variable. So, we might use 
a simple two-condition experiment, choosing some value of the 
variable for the experimental condition and zero value for the con- 
trol. This allows operational definition of the phenomenon if there 
is one. Under usual circumstances we would probably get an answer 
to the question of whether or not the variable is relevant for the 
behavior being studied. But, negative results are not very convincing. 
We might have sampled the “wrong” value of the variable; its true 
relationship with behavior may be such that only small amounts or 
medium amounts will influence behavior. 

Now suppose, in view of the above considerations, we sample 
two values along the dimension, and historically this usually means 
two extreme points. Wc are reducing rather greatly the probabilities 
that we will hit a “dead” spot, although in certain areas of behavior 
we might quite easily. But, even if we get positive results (if the 
results show the variable is relevant) we can say very little about the 
nature of the relationship. The relationship might be curvilinear, 
negatively accelerated, positively accelerated, linear, and so on, and 
we would not know it. Now, if we tap three points along the dimen- 
sion, say at the extremes and in the middle, the amount of informa- 
tion added is tremendous. We can be fairly confident that if we get 
no difference (over and above the control) the variable is not rele- 
vant, for it would be highly unlikely that we hit three dead spots. 
Furthermore, if it is a relevant variable, we have a fairly accurate 
estimate of the nature of the relationship although no one would 
deny that adding more points increases our confidence in this matter. 
So then, my first point is that except for pilot studies, if we are 
seriously asking about the influence of a given variable, we should 
tap the stimulus variable at least at three places, ideally widely spaced. 

2 . The orthogonal design explores the influence of at least two 
variables simultaneously. If we use three values along each dimension 
the result is two sets of relationships (as discussed above) plus the 
interaction effect (if any) of the two variables. The determination 
of interaction effects is important. I have a feeling that we have too 
long hidden ourselves behind the oft-given statement that behavior 
is so complex that we just can’t make progress very fast. One inter- 
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is 20 years old and deals only with human learning, it would be quite 
unfair to use it as representative of contemporary thinking (of 
Melton and others) even though it may well be. So, I shall state a 
position which probably represents some scientists’ point of view 
and then I will order the discussion around the implications of the 
statement. 

Suppose you were going to do a study in an area in which con- 
siderable work had already been accomplished. A program of stand- 
ardization would require that unless you had specific reasons for 
doing othei^vise you should keep the values of all static variables 
equivalent to those values used in previous studies. By static variables 
I mean those factors which are known to influence the phenomenon 
under consideration or which conceivably could so influence but 
where their relationship to the phenomenon under consideration is 
of no concern for this research. Thus, if you were going to perform 
a study on tachistoscopic thresholds of verbal material as a function 
of meaningfulness of words, and if most other research had used 
the asccr\ding method of limits, increasing exposure time rather than 
brightness, and so on, the principle of standardization would say 
that these static variables should be the same as used in other studies. 
The series of points to follow concerning this matter will suggest 
implications of such procedural conformity, both positive and nega- 
tive, as 1 see them. 

I. I have used the term “value” above, saying that the value of 
static variables should be kept the same as they had been in previous 
studies. Value thus implies a quantification of some sort. Such dupli- 
cation of quantitative values when physical scales are used would 
seem to be relatively straightforward but it isn’t always. Obviously, 
until standardization of calibration is achieved in such instances one 
cannot, even if desired, keep the static variables equivalent from one 
laboratory to another. I don’t think anyone would deny that it is 
a sad state of affairs when one wants to achieve standardization, 
thinks he is, but actually isn’t. 

When we are dealing with variables whose characteristics cannot 
be reflected by a physical scale, but which must be reflected by a 
psychological scale, we are in certain respects better off than when 
a physical scale is used. Thus, we can use nonsense syllables of speci- 
fied degrees of meaningfulness as scaled by a particular investigator 
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standing is achieved within a very limited context but it may later 
be broadened beyond this. 

3. Now, why is there an issue involved? Who could object to 
such a program of standardization? Let me first dispose of one 
objection which is occasionally raised, namely, that the principle 
of standardization is contrary to the principle of investigative free- 
dom. Such a criticism seems to me to result either from a misconcep- 
tion of what is meant by standardization or a misunderstanding of 
freedom of inquiry. There is nothing in a principle of standardiza- 
tion which limits one’s area or which prevents him from exploring 
the effects of any variable one wants to. All the principle says is 
that we should have continuity of variables if we are not interested 
in the influence of those variables. I don’t see how even the most 
rabid protector of research freedom would say that this abrogates 
or threatens this freedom. The principle says that here are factors 
in which you are not interested; wouldn’t it be worthwhile to handle 
these in such a way so that different researches in different labora- 
tories may become a common body of data? It seems to me that if 
anyone would object to this on the grounds of a threat to research 
freedom alone he is showing somewhat irresponsible behavior. 

4. 1 think objections may be raised to the principle of standardiza- 
tion which are serious, legitimate, and which must be considered 
before one embraces the principle. The objection comes from a 
consideration of the objectives of science and how those objectives 
can be most efficiently achieved. To advance the argument system- 
atically will take a little preparation. 

With some fear of reprisals, I must say again that we have noted 
that at the empirical level the number of variables which might in- 
fluence behavior is very great. Actually, there seems to be no way 
to avoid in the long run of our science the task of somehow de- 
termining which factors are relevant and which are not. It is compli- 
cated further by the fact that as we well know a variable may be 
relevant for one form of behavior and not for others. Then there 
are the interactions about which we have spoken. In the face of 
this gigantic task one might throw up his hands in despair; indeed, 
some have done so and taken up other pursuits. But, there are many 
others who have gone about their research, got hold of a restricted 
range of phenomena, and have gone about the analyses of these 
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variables which arc likely to be most relevant. We then start about 
our business of isolating the effects of these variables. All others, in- 
cluding those which we judged to be irrelevant arc fixed at a given 
level throughout the series of researches; they become the static 
variables of the situation. It is quite obvious that a tight, highly sys- 
tematic body of data could be built up around this situation. Now, 
let us suppose that an invesigator at another university gets interested 
in the area of research. The standardization dictum would say that 
he should duplicate the values of the static variables of the first 
investigator. If he did this, the two sets of data could be readily 
merged. 

The generality of the phenomena studied within the highly re- 
stricted set of conditions as outlined is unknown. But this generality 
could be determined over a period of time by ticking off one vari- 
able after another by a series of researches; that is, one could now 
investigate the influence of variables which had previously been 
static. Subject characteristics may be varied, task characteristics and 
so on. But I would venture an opinion that this procedure is in- 
efficient. If I take a position that we should not standardize in this 
sense, and I do take such a position, I advance the following sorts 
of arguments. 

We prejudge many stimulus conditions to be irrelevant and we 
are frequently wrong. Supposing that 1, as the second investigator 
in the above situation, did not adhere to the principle of standardiza- 
tion. So I do an experiment which repeats one of the researches of 
the previous investigator (at least in part) except that I deliberately 
use different values of, say six static variables, whether I judge them 
to be relevant or not. Now supposing I obtain the same results the 
previous investigator obtained. Barring the cancellation effects of 
variables having opposite influences, what have I achieved? It seems 
to me that as discussed earlier I have shown in a single experiment 
what would have taken the other investigator six experiments to 
show if he adhered to his philosophy. If, on the other hand, I do 
indeed find a difference between his results and mine, I have identi- 
fied a pertinent variable which he may have judged, at least tenta- 
tively, to be irrelevant and I can now proceed with more analytical 
research to determine which variable or combination is responsible. 
Under no circumstances would I ever have to do more work than 
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on. I find such suggestions quite revolting. I have tlicrcforc con- 
cluded that there is little I can say in a positive way which I can 
condone and so am left moaning over our writing ills. 

But, still talking at the level of platitudes, 1 am convinced that 
writing scientific prose is a skill which develops with practice and 
knowledge of results provided by sonicooc who is judged to be able 
to write straightforward research reports. Our conviction in the 
worthwhileness of this platitude is strong cnougli so that both our 
undergraduate and graduate students arc given practice in writing 
manuscripts (over and above theses) and these arc meticulously 
marked for clarity of expression, organization, and so on. It would 
be nice to have a control condition for this treatment; if we did I 
suspect our faith ia the value of the practice might be considerably 
shaken. Nevertheless, I suspect there are many worse ways for 
students and instructors to spend their time in a program designed 
to train scientists. 

There is an issue, however, regarding the reporting of research, 
which I feel is worth a little space. Some of the issues on which there 
are varying shades of opinion which we have discussed in this book 
were resolved by appealing to a criterion of efficiency in our scien- 
tific pursuits. I don't know for sure that efficiency is a legitimate 
criterion to use in deciding issues even if it is the only differentiating 
factor between two positions. 1 have obviously used it in some cases 
to aid in arriving at a point of view. This does not, of course, give 
it a sanctified status even though our culture nearly forces such a 
status on us. Nevertheless, since science is a part of our culture, and 
since saving in manpower may result from efficiency in science as 
well as in an A & P store, I do not think we should disregard it as a 
criterion. Be this reasonable or not, I would like to mention briefly 
certain ideas about the scientists’ obligation to report research. 

If our universities grant us time, money, and freedom to do re- 
search as we choose, we have certain responsibilities in return. One 
of these obligations is that of making available to the scientific pub- 
lic the results of our investigations. 1 frankly get a little disgusted 
hearing by word of mouth that so and so did an experiment five 
years ago and found exactly the same as I found in an experiment 
just completed. The other investigator never got around to publish- 
ing the results of his research; if he had I never would have done 
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findings; whatever the motives are which lead to this I am for; those 
that prevent it, I am against. 


THE EMPIRICAL INTEGRATOR 

My major point of departure in this book has been problems 
facing the worker actively engaged in planning, executing, and 
interpreting research. It is from his activities, when properly carried 
out, that we gain initial status as a scientific discipline. But, data, 
facts, and relationships cannot be allowed to lie undigested on the 
pages of our journals. One of the purposes of theory is to provide 
this assimilative function. But I do not believe that explanatory at- 
tempts are satisfactorily handling this function at the present time. 
If I correctly assess the current situation, it is that we have vast 
bodies of data even within areas within psychology which desper- 
ately need to be brought into some sort of integrative scheme. And 
the data are being spewed out at ever-increasing rates. What are 
the dangers of this? 

In some of the older sciences, great quantities of research findings 
remain unincorporated within theories. In certain areas within these 
sciences there is gra'^e danger that many of these data will be lost 
to successive generations of students. When such mounds of data 
remain unintegrated no graduate student can, without a nearly 
prohibitive amount of time and effort, be expected to master these 
facts. Consequently, the student tends to shunt himself toward 
more newly developed areas where the background study for his 
own research can be accomplished with the expenditure of a reason- 
able amount of time and effort. The upshot of this is that some of 
the older areas of research are no longer attracting research workers, 
not because the. problems in those areas are unimportant, but because 
it is so difficult “to get ahold” of the area in a manner befitting a 
scholar. 

If I read the signs correctly, certain areas of psychology may be 
approaching this point. I note what might be called “fads” in re- 
search areas develop and that we will have a spate of Ph.D. theses 
in these areas. Thus the work on perceptual thresholds and proba- 
bility learning seems to fit this category. The amount of background 
reading necessary to do well-informed research in these areas is 
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allotted time behaving like a venerable seer, for that is what I fear 
I have been doing in this book. 
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